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In this visualization, giant black holes are on the cusp of merging. A team of astronomers has predicted an imminent merger in a distant galaxy. 
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An Empoasca leafhopper feeds ona 

wild tobacco plant (Nicotiana attenuata). 
Leafhopper pests select host plants by detect- 
ing consequences of jasmonate signaling, 

a hormonal signaling cascade elicited in 
response to physical disturbances. By studying 
recombinant inbred lines of tobacco plants 

in their native habitat, 
researchers have identi- 
ied chemistry created 
by the union of volatile 
and nonvolatile defenses 
in leaves that bolsters 
plants’ resistance to leaf- 
hoppers. See page 514. 
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EDITORIAL 


Science needs affirmative action 


s science struggles to correct systemic racism 
in the laboratory and throughout academia 
in the United States, external forces press on, 
making it even more difficult to achieve equi- 
ty on all fronts—including among scientists. 
The latest example is the decision by the US 
Supreme Court to hear cases brought against 
Harvard University and the University of North Car- 
olina (UNC) at Chapel Hill challenging their right 
to use race as a factor in undergraduate admissions. 
It is sometimes easy for scientists to let colleagues 
in other disciplines engage in a debate like this, but 
the dismantling of race-conscious admissions would 
deal another blow to equity in science. The Supreme 
Court has protected affirmative action in the past, 
but the Court’s current major- 
ity of conservative justices could 
mean the end of the program. 
This is no time for the scientific 
community to stay silent. It is a 
crucial moment for science to 
mobilize against this latest as- 
sault on diversity. 

For more than 50 years in the 
United States, colleges and uni- 
versities have been using multiple 
criteria to select undergraduates, 
recognizing that a diverse student 
body is essential for the university 
to achieve its mission. I asked Pe- 
ter Henry, the WR Berkley Profes- 
sor of Economics and Finance at 
New York University, about the 
economic data on the matter. “Affirmative action cor- 
rects a market failure,” he said. “Talent is broadly dis- 
tributed across the US population, but opportunity is 
not.” The process gives deserving students a chance that 
they might not otherwise have, adding excellence to the 
higher education system. It also acknowledges that not 
all students have an equal opportunity to excel at objec- 
tive measures like standardized tests and grades, and it 
levels the playing field by giving students and universi- 
ties the chance to spotlight other important attributes 
and factors in the admissions process. 

I know something about this struggle because I was 
one of the chancellors of UNC who oversaw the admis- 
sions policies in question. When the Supreme Court 
took up the case of Abigail Fisher versus the Univer- 
sity of Texas at Austin, I submitted an amicus brief pre- 
pared by UNC’s law dean and general counsel. Fisher, 
a white student, challenged the university’s consider- 


* lismantling 
of race-conscious 
admissions 


would deal another 
blow to 
equity in science.” 


ation of race in its undergraduate admission process. 
Denied admission in 2008, she argued that the use of 
race in this manner violated her constitutional right to 
equal protection. In the brief, it was shown convincingly 
that students chosen for admission based on a range 
of criteria, including race, ethnicity, and socioeconomic 
background, fared better than those chosen solely on 
the basis of standardized test scores and high school 
grades. This commitment to providing access to higher 
education has now landed UNC in the courts. 

All of this is bad for science. Failure to enroll a diverse 
undergraduate population has already excluded out- 
standing people from science, and limiting affirmative 
action will only make matters worse. But much more 
insidious are the messages these fights continue to 
send. It’s bad enough that science 
faculty haven’t continually up- 
dated their methods of teaching to 
ones known to be more inclusive. 
Likewise for universities and their 
processes for faculty hiring, pro- 
motion, and tenure that sustain in- 
equity. Now, on top of all that, the 
highest court in the United States 
is going to engage in a highly pub- 
lic debate over whether many of 
the country’s potential future stu- 
dents of science can enter the sci- 
entific community, continuing the 
perpetual message of exclusion. 

The cases currently before the 
court involve claims that Asian 
Americans are penalized for their 
race in admissions decisions at Harvard and UNC. As 
Jennifer Lee, Professor of Sociology at Columbia Uni- 
versity, points out in the Editor’s Blog this week, this 
misrepresents Asian American sentiment: 70% of Asian 
Americans support affirmative action, and fewer than 
10% have reported being passed over for college admis- 
sions. As Lee notes, the cases before the court will not 
address real anti-Asian bias on college campuses. 

What can scientists do to counteract all of this? Study 
the data showing that talent is broadly distributed and 
then use this evidence to help fight exclusive practices. 
It’s also important to emphasize that grades and stan- 
dardized test scores alone are insufficient selection 
criteria. But more importantly, show up this go-round. 
Students deserve to see science faculty rise up alongside 
colleagues in the humanities to support affirmative ac- 
tion. That will be a powerful message of welcome. 

-H. Holden Thorp 


H. Holden Thorp 
Editor-in-Chief, 
Science journals. 
hthorp@aaas.org: 
@hholdenthorp 
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44 Underserved communities ... have been waiting long 
enough, and they are counting on us to get this right. 99 


U.S. Environmental Protection Agency Administrator Michael Regan, about the 
agency's new plans to increase monitoring of industrial pollution implicated in high cancer rates. 


Edited by Jeffrey Brain: 
A miner in Madre de Dios in Peru pours a mixture of mercury and gold into a container. 


ENVIRONMENT 


Gold mines flood forests with mercury 


nearby gold rush has left protected jungles in the Peruvian 

Amazon polluted with toxic mercury at among the world’s high- 

est levels, comparable to those in forests near major industrial 

cities in China, a study has found. Small-scale, illicit gold miners 

around the world, often in impoverished regions such as Peru’s 

Madre de Dios, use mercury to separate gold flakes from raw 
ore. The mercury is then burned off to extract the gold. In a protected 
forest near a Peruvian mining hot spot, researchers found mercury 
in tree leaves, runoff, and soil at levels up to 15 times higher than in 
nearby unforested areas, according to an article last week in Nature 
Communications. The results—the first tracking the toxic metal’s path- 
way through forests near mine sites—suggest forests act as a mercury 
sponge, concentrating and storing it. But some mercury also finds 
its way into water bodies, where it is transformed to the more toxic 
methylmercury, the researchers discovered; that chemical showed up in 
the forest’s songbirds at levels that would impair reproduction. Small- 
scale, “artisanal” gold mining recently outstripped coal burning as the 
world’s single largest source of airborne mercury pollution, annually 
releasing as much as 1000 tons. 
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Watchdog chides health agency 


PUBLIC HEALTH | The auditing arm of 

the U.S. Congress last week slammed the 
Department of Health and Human Services 
(HHS) for “persistent deficiencies” in its 
response to the coronavirus pandemic 

and past public health emergencies. For 
example, HHS still has no comprehensive 
COVID-19 testing strategy, according to a 
27 January report from the Government 
Accountability Office (GAO). The problems 
date back more than 10 years to other 
crises, including the H1N1 influenza pan- 
demic, the Zika and Ebola virus outbreaks, 
and the public health threats posed by 
natural disasters such as hurricanes. The 
failures leave the nation vulnerable to 
future viruses and weather events, GAO 
says. In tandem with the release of the 
report, GAO announced it has added HHS 
leadership and public health emergency 
coordination to its list of “high-risk” issues 
that Congress and the executive branch 
should address. The list now highlights 

37 problems at more than a dozen agen- 
cies, with some dating back to 1990. 


Malaria bed nets protect long term 


PUBLIC HEALTH | Bed nets can save young 
children from malaria, but some research- 
ers have worried about a “rebound effect,” 
in which children succumb to the disease 
later in life because they lack natural 
immunity. A new, unusual follow-up study 
has dispelled those fears. Researchers 
tracked down nearly 6000 people who, 

as infants or toddlers, had been part of 

a study in Tanzania that measured the 
efficacy of insecticide-treated bed nets 
between 1998 and 2003. Among the 
participants—young adults today—they 
found no sign of a rebound effect: Those 
who, decades ago, slept under a bed net 
more than half the time still had a 40% 
survival advantage in 2019 over those 
who slept under nets less frequently, 
according to the study in this week’s issue 
of The New England Journal of Medicine. 


Breyer shaped law on experts 


LAw | U.S. Supreme Court Justice Stephen 
Breyer, who last week announced he will 
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a i 
, The bright plane of the galactic center is 


~ crosscut by mysterious strands in this image 
° ‘ 
from a South African radio telescope array. ~ 


Telescope reveals plethora of mysterious Milky Way filaments 


ne of the most detailed pictures yet of the center of the 
Milky Way has revealed nearly 1000 mysterious strands 
that slash across the plane of the galaxy, 10 times more 
than previously known. The image, released last week by 
South Africa’s MeerKAT radio telescope array, shows a 
region 25,000 light-years from Earth. Colors denote the bright 
radio emissions from objects such as stellar nurseries and 
supernova remnants, the expanding shells of exploded stars. 
The brightest spot of all is the home of the Milky Way’s giant 


retire later this year, will leave a notable 
imprint on the use of science in U.S. 
courtrooms. During his 27 years on the 
bench, he wrote opinions that helped 
clarify how judges should decide what 
kinds of expert testimony to allow. In 1999, 
he authored a key opinion in Kumho Tire 
Co. v. Carmichael, which established that a 
judge’s gatekeeping authority applies not 
only to testimony from witnesses who are 
scientists, but also those who are engineers 
or technical specialists. In a 1998 essay in 
Science, Breyer argued that judges increas- 
ingly needed education about technical 
issues. And in a separate, 2000 essay, he 
wrote that legal proceedings are not neces- 
sarily a “search for scientific precision. ... 
But the law must seek decisions that fall 
within the boundaries of scientifically 
sound knowledge.” 


Breeding the ideal, edible worm 


GENETICS | A French company last week 
announced it is starting the first industrial 
breeding program to grow beetle larvae on 
a large scale as food for humans and ani- 
mals. Ynsect already grows and processes 
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the yellow mealworm beetle (Tenebrio 
molitor) to turn into powders and oils for 
fishmeal and pig feed. In 2021, the com- 
pany published the worm’s genome. Now, 
it is working to identify strains of this 
species and other beetles with desirable 
traits, including faster growth and repro- 
duction, more efficient food consumption, 
and pathogen resistance. Most traits 
involve a complicated tangle of genes, but 
large-scale screening could speed the selec- 
tion, specialists say. Food specialists say 
mealworms could help alleviate food 
insecurity. They are high in protein, 
and raising them emits much 


A French company 

is developing the 
larvae of the yellow 
mealworm beetle 
as a food source. 


black hole, with a mass of 4 million Suns. But researchers were 
also intrigued to find so many radio-emitting filaments, up to 
150 light-years long, cutting across the scene. They are thought 
to arise from electrons moving close to the speed of light as 
they gyrate around magnetic field lines. But researchers don’t 
know what accelerates the electrons, why the filaments exist 

in regularly spaced clusters, or what creates the magnetic field 
lines in the first place. Some suspect outbursts of the black 
hole are responsible. 


less greenhouse gas than other forms of 
animal protein. Last year, the European 
Food Safety Authority deemed the yellow 
mealworm safe for human consumption. 


Awards bypass Asian researchers 


DIVERSITY | Asian scientists are mark- 
edly underrepresented among recipients 
of U.S. biomedical research prizes, an 
analysis shows. Only 6.8% of 838 awardees 
who received 14 top U.S. prizes, such as 
the Albert Lasker Basic Medical Research 
Award, are of Asian descent, even though 
they make up more than 20% of U.S. 
biomedical faculty researchers, accord- 

ing to a commentary this week in Cell. 

For Black scientists, the picture is worse: 
They make up 2.6% of biological science 
faculty, but were shut out of the prizes. 
But there’s some reason for hope: In the 
past decade, the percentage of female 
recipients of eight long-running prizes 
increased substantially, from 10% to almost 
30%, a change that may reflect efforts to 
promote gender equality. To improve racial 
and ethnic diversity, award panels should 
encourage self-nominations, among other 
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Very old trees like 
“General Sherman,” 

a giant sequoia in 
California, can bea 
forest’s insurance 
policy for weathering 
environmental changes. 


Elder trees promote forest health, diversity 


ncient trees are rare but play an outsize role in helping a forest survive, says a 

study that quantifies conditions under which a forest gains these old-timers. 

Charles Cannon of the Morton Arboretum and colleagues analyzed published 

annual death rates of forests—the percentage of trees that die each year. The 

team's simulations indicated that if the rate does not exceed 1%, about 1% of a 
forest's trees will eventually become long-lived behemoths—surviving for hundreds or 
thousands of years, up to 20 times longer than the trees around them, the scientists 
report this week in Nature Plants. Luck plays a large role in which trees survive light- 
ning, fires, chain saws, drought, and disease. But the paper suggests genetics make 
the old-timers more resilient, particularly in dealing with long-term climate oscillations, 
which in turn helps make the entire forest more adaptable and sustainable. It’s yet 
another reason, the authors say, for protecting old-growth forests. 


steps, says the commentary’s author, neuro- 
scientist Yuh Nung Jan of the University of 
California, San Francisco. 


Tighter soot limits offer benefits 


EPIDEMIOLOGY | Tightening the air 
quality standard for particulate matter in 
the United States would prevent prema- 
ture deaths in older people, a study has 
found. It offers stronger evidence than 
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previous analyses covering fewer people, 
which also found that low levels of the 
small particles, measuring no more than 
2.5 micrometers wide, pose health risks. 
Researchers led by Francesca Dominici of 
Harvard University compared the health of 
68.5 million Medicare recipients, all ages 
65 or above, across the United States with 
their estimated exposure to air pollution 
between 2000 to 2016. More than 143,000 
deaths in this group could have been 


avoided if the U.S. standard for particulate 
matter had been 10 micrograms per cubic 
meter between 2006 and 2016, instead 

of the current 12 micrograms. The U.S. 
Environmental Protection Agency (EPA) 
had already planned to propose a new stan- 
dard for the pollutant this spring, which is 
intended to protect people of all ages. The 
new study was released on 26 January by 
the Health Effects Institute, which is funded 
by EPA and industry groups. 


COVID-19 vaccines make strides 


PANDEMIC | Two makers of COVID-19 
vaccines logged major milestones on 

31 January. Moderna won full approval from 
the U.S. Food and Drug Administration 
(FDA) for its messenger RNA-based 
vaccine, 13 months after the agency 
granted the company an emergency use 
authorization (EUA). It is the country’s 
second fully authorized COVID-19 vac- 
cine, after Pfizer’s, which won approval 
in August 2021. And after a monthslong 
delay caused by manufacturing issues, 
Novavax applied to FDA for an EUA for 
its protein-based vaccine. Last month, 

it won conditional marketing authoriza- 
tion in Europe, and the World Health 
Organization granted it an emergency use 
listing, opening up an avenue to buttress 
global vaccine supplies. 


Prized dinosaur tracks damaged 


PALEONTOLOGY | A backhoe operator 
last week reportedly damaged part of 
one of North America’s largest and most 
diverse sets of early Cretaceous dinosaur 
tracks near Moab, Utah. The Mill Canyon 
Dinosaur Tracksite contains more than 
200 tracks left by at least 10 different 
species about 112 million years ago. Last 
week, work was underway to replace 

a boardwalk at the location, which is 
administered by the U.S. Bureau of Land 
Management (BLM). Paleontologists say 
the agency provided no notice of the 
work and had no fossil expert on site to 
monitor it; BLM’s Moab office has lacked 
a paleontologist on staff since 2018. In a 
statement this week, BLM did not explain 
the apparent damage or accept responsi- 
bility, saying only “heavy equipment is on 
location, but it is absolutely not used in 
the protected area,” and it “is committed 
to balancing resource protection and pub- 
lic access” to the site. The damage there 
was verified in person this week by Utah’s 
state paleontologist. 
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Imminent merger of giant black holes predicted 


Never-before-seen event could spark cosmic fireworks—if it is not a mirage 


By Daniel Clery 


ick ... tick ... boom? In the center of 
a galaxy 1.2 billion light-years from 
Earth, astronomers say they have 
seen signs that two giant black holes, 
with a combined mass of hundreds of 
millions of Suns, are gearing up for 
a cataclysmic merger as soon as 100 days 
from now. The event, if it happens, would 
be momentous for astronomy, offering a 
glimpse of a long-predicted, but never wit- 
nessed mechanism for black hole growth. 
It might also unleash an explosion of light 
across the electromagnetic spectrum, as 
well as a surge of gravitational waves and 
ghostly particles called neutrinos that could 
reveal intimate details of the collision. 

As soon as the paper appeared last week 
on the preprint server arXiv, other astrono- 
mers, eager to confirm the tantalizing sig- 
nals, rushed to secure telescope observing 
time, says team member Huan Yang of the 
Perimeter Institute in Waterloo, Canada. 
“We've seen people acting pretty fast,’ he 
says. Emma Kun of Konkoly Observatory in 
Budapest, Hungary, began to scour archives 
of radio observations for confirmation of 
the signal. “If the boom happens, it will con- 
firm many things,” she says. 

But the prediction may be a mirage. It’s not 
clear that the observed galaxy holds a pair 
of black holes, let alone a pair that’s about 
to merge, says Scott Ransom of the National 
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Radio Astronomy Observatory, who finds the 
presented evidence “pretty circumstantial.” 

Supermassive black holes are thought to 
lurk at the heart of most, if not all, galaxies, 
but theorists don’t know how they grow so 
big. Some sporadically suck in surrounding 
material, fiercely heating it and causing the 
galaxy to shine brightly as a so-called active 
galactic nucleus (AGN). But the trickle of 
material may not be enough to account for 
the black holes’ bulk. They could gain weight 
more quickly through mergers: After galax- 
ies collide, their central black holes could 
become gravitationally bound and gradually 
spiral together. 

Such black hole pairs are not easy to 
detect. X-ray telescopes have discovered 
a handful of AGNs with two bright, sepa- 
rated central sources, but the putative black 
holes are hundreds of light-years apart and 
wouldn’t collide for billions of years. Once 
they get closer, it’s almost impossible to sep- 
arate their light with a telescope. But some 
AGNs dim and brighten every few years—a 
sign, astronomers have argued recently, that 
they harbor pairs of black holes orbiting 
each other that regularly churn and heat the 
surrounding material. Some of these peri- 
odic oscillations have faded, however, call- 
ing into question the binary interpretation. 
“AGNs do all sorts of crazy things we don’t 
understand,” Ransom says. 

In data from a survey telescope in Cali- 
fornia called the Zwicky Transient Facil- 


In this visualization, a pair of giant black holes is 
about to merge—an event astronomers long to see. 


ity (ZTF), a team led by Ning Jiang of the 
University of Science and Technology of 
China stumbled on a periodic AGN called 
SDSSJ1430+2303. “My first instinct was it 
must be related to a pair of supermassive 
black holes,” Jiang says. 

Then, the researchers found something 
more: a trend they interpret as a binary pair 
closing in on a merger. The cycles were get- 
ting shorter, going from 1 year to 1 month 
in the space of 3 years. It is “the first official 
report of decaying periods which reduced 
over time,” says Youjun Lu, a theoretical 
astrophysicist at the National Astronomical 
Observatories of China, who was not part 
of the team. 

The researchers confirmed the month- 
long oscillation in x-ray observations from 
NASA’s orbiting Neil Gehrels Swift Obser- 
vatory. If this decreasing trend continues, 
the black holes, which Jiang says come as 
close to each other as the Sun is to Pluto, 
will merge in the next 100 to 300 days, they 
report in the paper, which has not been 
peer reviewed. 

If the merger comes to pass, observers 
could have a field day. “There should be a 
huge burst across the electromagnetic spec- 
trum, from gamma rays to radio,” Kun says. 
Some also expect a flood of neutrinos, which 
the IceCube detector at the South Pole— 
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1 cubic kilometer of polar ice outfitted with 
light sensors to detect neutrino impacts— 
could pick up. Neither kind of outburst is 
certain, however. Some predict a whimper 
rather than a bang. “We really don’t know 
what to expect,’ Ransom says. 

The only certain signal is gravitational 
waves, but the ponderous colliding masses 
would emit them at too low a frequency to 
be picked up by detectors such as the Laser 
Interferometer Gravitational-Wave Obser- 
vatory, which is tuned to smaller mergers. 
They should, however, leave an imprint on 
spacetime itself, a sort of relaxation of dis- 
tance and time dubbed gravitational wave 
memory, which could be detected over 
many years by monitoring the metronomic 
pulses of spinning stellar remnants known 
as pulsars. “It’s a very tricky signal to mea- 
sure,’ Ransom says, “but that would be de- 
finitive, a total smoking gun” of merging 
supermassive black holes. 

But Ransom is braced for disappoint- 
ment. He points out that the team is basing 
its prediction on just a handful of observed 
cycles. Theorist Daniel D’Orazio of the Niels 
Bohr Institute in Copenhagen, Denmark, 
says some aspects of the AGN’s light curve 
also raise doubts. For example, he says, 
the ZTF archives show SDSSJ1430+2303 
lacked a periodic oscillation in the years 
before Jiang’s team discovered it; its dim, 
steady emission then looked more like a 
standard AGN with a single supermassive 
black hole. “Why has [the oscillation] just 
turned on now?” D’Orazio asks. “I’m not 
sure how that steady emission fits with bi- 
nary emission models.” 

Observations in the coming months 
should show whether the oscillation con- 
tinues to shorten. The team had to halt 
its observing in August 2021 when Earth’s 
orbit put the distant galaxy too close to 
the Sun for telescopes to observe it safely. 
Observations restarted in November, but 
since then technical glitches have idled 
both ZTF and Swift. 

Andrew Fabian of the University of Cam- 
bridge is among the astronomers who will 
be chasing the will o’ the wisp, having ap- 
plied for time on NASA’s Neutron star 
Interior Composition Explorer, an x-ray tele- 
scope attached to the International Space 
Station. “If this is true, then it’s important 
to get as many observations as possible now 
to see what it’s doing,” he says. Fabian says 
the chance of such a merger taking place so 
close to Earth in any given year is one in 
10,000. He’s skeptical that one is imminent, 
but says it’s worth monitoring for a few 
months to see whether the claim holds up. 
“Rare events do happen,” he says. 


With additional reporting by Ling Xin in Beijing. 
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Indonesia's utopian new capital 
may not be as green as it looks 


Moving the government to Borneo could speed deforestation 


By Dennis Normile 


ndonesia has yet to start building its new 
capital, Nusantara, but a slick website 
shows what the country has in mind. A 
video shows people strolling on board- 
walks through lush greenery, housing 
perched on the shores of an idyllic lake, 
stunningly modernistic buildings, elevated 
mass transit lines, and bicycles on tree-lined 
boulevards. Dominating the city is a cluster of 
monumental buildings, including a presiden- 
tial palace in the shape of the mythical bird- 
like Garuda, Indonesia’s national emblem. 
The new capital, whose construction on 
Borneo’s east coast was approved by Indone- 
sia’s parliament on 18 January, will replace 
overcrowded and increasingly flood-prone 
Jakarta, on Java. Planners are envisioning 
an environmental utopia for Nusantara, 
which means “archipelago.” All residents will 
be within a 10-minute walk of green recre- 
ational spaces. Every high rise will utilize 
100% eco-friendly construction and be en- 
ergy efficient. Of trips taken within the city, 
80% will be by public transport or on foot or 
bicycle. Nusantara presents an opportunity 
“to build a model city that is respectful of the 
environment,’ says Sibarani Sofian, an urban 
designer with Urban+, the firm that won the 
competition for a basic design for the city’s 
governmental core. But others see shadows 
in this utopian vision. 
“The big question, of course, is how and 
if they'll achieve these ambitions,’ says Kian 


Goh, who studies urban planning at the 
University of California, Los Angeles. “Plan- 
ning scholars are by and large skeptical of 
plans for smart or sustainable cities ‘from 
scratch,” she says. And spillover effects 
across Borneo, including deforestation, 
“are likely to be far greater than the direct 
impacts within the city boundaries, un- 
less carefully managed,” says ecologist Alex 
Lechner of Monash University, Indonesia. 

Indonesian President Joko Widodo pro- 
posed the new capital in April 2019 and 
later that year picked the site in East Ka- 
limantan province. He wanted to move the 
capital closer to the nation’s geographic 
center and spur economic growth in the 
archipelago’s east, while easing Jakarta’s 
burden. Sprawling over nearly 6300 square 
kilometers (km?), the Jakarta metropoli- 
tan area is Southeast Asia’s most populous 
conurbation, home to more than 31 million 
people. Haphazard growth has led to noto- 
rious traffic jams and pollution. 

The old capital is also sinking. Many 
residents rely on wells that are pumping 
underground aquifers dry, leading to 
ground subsidence of more than 10 centi- 
meters annually along the northern rim of 
the city, on the shores of Jakarta Bay—even 
as sea levels rise because of climate warm- 
ing. The area, home to poor and working 
classes, floods annually. A 2020 flood killed 
more than 60 and displaced more than 
60,000. Without heroic efforts to limit the 
sinking, 25% of the capital area will be 


Starting from scratch 


Indonesia will move its seat of government—and an estimated 4.8 million civil servants—from Jakarta to 
Nusantara, a brand new city on Borneo’s east coast that is closer to the country's geographical center. 
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Plans for Nusantara include lots of green space and a palace in the shape of Garuda, a mythical birdlike creature. 


submerged by 2050, says Edvin Aldrian, a 
climatologist at Indonesia’s National Re- 
search and Innovation Agency. 

Moving the seat of government and its es- 
timated 4.8 million workers won’t lighten Ja- 
karta’s burdens much, Aldrian says. “Jakarta 
will still be the economic center of Indonesia 
... and still have to take on its social issues 
and environmental issues,’ Goh says. 

Meanwhile, the $32 billion new capital, 
whose construction can now get underway, 
will have an environmental impact on Bor- 
neo. Nusantara, to be built in stages through 
2045, will cover 2560 km’, about twice the 
area of New York City. (The government 
will occupy a 66-km” core.) Like the United 
States, Brazil, and other countries that built 
new capitals from scratch, Indonesia hopes 
to create a city that is modern, rationally 
planned, and—in Indonesia’s case—green, 
with net-zero emissions. But critics are 
skeptical, because Indonesia’s renewable 
energy sector currently provides just 11.5% 
of national energy. Environmental groups 
worry that as a stopgap Nusantara could 
rely on power from Kalimantan’s numerous 
coal-fired power plants. And although well- 
designed public transport might keep cars 
off its roads, there will likely be extensive 
air travel between the new capital and Ja- 
karta, about 1300 kilometers away. 

The impact on Borneo’s ecology could be 
substantial. An island the size of California, 
Borneo features coastal mangroves, forests, 
swamps, and mountains, hosting numerous 
endemic and rare species. Nusantara itself 
will be built on a previously cleared site and 
rely on existing highways, power lines, and 
other infrastructure. The city also lies inland, 
allowing for shoreline mangrove restora- 
tion. River valleys will be protected, creating 
what Lechner calls “green fingers” reaching 
through the city. 

But the worry is that Nusantara will trig- 
ger sprawl beyond the city limits and devel- 
opment across Borneo. Spurring economic 
growth is, after all, one of the goals. By study- 
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ing the increase in nighttime lights associ- 
ated with 12 previously relocated capitals, 
including Brasilia and Naypyidaw, Myanmar, 
Lechner and his colleagues found that they 
burgeoned initially, then grew more slowly. 
“Our assessment suggests that it is likely that 
[Nusantara’s] direct footprint could grow 
rapidly, expanding over 10 kilometers from 
its core in less than two decades and over 
30 kilometers before mid-century,’ the team 
reported in 2020 in the journal Land. 

The impacts are likely to go farther afield. 
The roads connecting Brasilia to Brazil’s 
coastal population centers “facilitated the de- 
struction of the Amazon rainforest,’ Lechner 
says, opening undisturbed territory to wild- 
life poaching, illegal logging, and land clear- 
ing. There are fewer cities to connect to on 
Borneo, home to only 18 million people, but 
“clearly the new city will attract economic ac- 
tivity, including new roads, which are known 
to cause deforestation,” says David Gaveau, a 
landscape ecologist who heads TheTreeMap, 
acompany that studies tropical deforestation. 

Kalimantan, the Indonesian part of Bor- 
neo, has already lost about 30% of its origi- 
nal forest cover to land clearing and fires 
since 1973, Gaveau says, leaving the Bornean 
orangutan and the proboscis monkey endan- 
gered and many other species threatened. 
A highway under construction called the 
Trans-Kalimantan Northern link “cuts right 
through remote pristine forests in the heart 
of Borneo,’ he says. Encouragingly, more ef- 
fective law enforcement and a moratorium 
on new plantations helped drive 2020 defor- 
estation to its lowest level in 17 years, Gaveau 
says, but new roads to and from Nusantara 
could reverse the trend. 

The Indonesian government has not 
said much about Nusantara’s environmen- 
tal burden. Gaveau and others hope it will 
offset the city’s impact with a similarly am- 
bitious effort to turn the tide elsewhere in 
Kalimantan. “The solution lies in restoring 
all those degraded lands back to their origi- 
nal state: forest,” Gaveau says. l 


COVID-19 


New Omicron 
begins to take 
over, despite 
late start 


BA.2 strain may extend 
latest surge, but its overall 
impact remains unclear 


By Meredith Wadman 


n 7 December 2021, as the Omicron 

variant of the pandemic coronavirus 

began to pummel the world, scientists 

officially identified a related strain. 

BA.2 differed by about 40 mutations 

from the original Omicron lineage, 
BA.1, but it was causing so few cases of 
COVID-19 that it seemed a sideshow to its 
rampaging counterpart. 

“I was thinking: ‘BA.1 has the upper 
hand. We’ll never hear again from BA.2,” 
recalls Mark Zeller, a genomic epidemio- 
logist at the Scripps Research Institute. 
Eight weeks later, he says, “Clearly that’s 
not the case. ... ’m pretty sure [BA.2] is go- 
ing to be everywhere in the world, that it’s 
going to sweep and will be the dominant 
variant soon in most countries if not all.” 

Zeller and other scientists are now try- 
ing to make sense of why BA.2 is explod- 
ing and what its emergence means for the 
Omicron surge and the pandemic overall. 
Already a U.K. report issued last week and 
a large household study from Denmark 
posted this week as a preprint make it clear 
BA.2 is inherently more transmissible than 
BA.1, leaving scientists to wonder which of 
its distinct mutations confer an advantage. 

But so far, BA.2 does not appear to be 
making people sicker than BA.1, which it- 
self poses less risk of severe disease than 
variants such as Delta and Beta. In Den- 
mark, where by 21 January BA.2 accounted 
for 65% of new COVID-19 cases, “We see 
a continuous, steep decline in the num- 
ber of intensive care unit patients and ... 
now a decrease in the number of hospital 
admissions related to SARS-CoV-2,” says 
Tyra Grove Krause, an infectious disease 
epidemiologist at the country’s public 
health agency. In fact, the Danish govern- 
ment is so confident the variant won’t 
cause major upheaval that it lifted almost 
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all pandemic restrictions on 1 February. 

Still, some scientists predict BA.2 will 
extend Omicron’s impact. “I would guess 
we'll see [BA.2] create a substantially lon- 
ger tail of circulation of Omicron than 
would have existed with just [BA.1], but 
that it won’t drive the scale of epidem- 
ics we've experienced with Omicron in 
January,’ computational biologist Trevor 
Bedford of the Fred Hutchinson Cancer 
Research Center tweeted on 28 January. In 
South Africa, BA.2 already may be stalling 
the rapid decline in new infections seen af- 
ter the country’s Omicron wave peaked in 
December 2021. 

Although BA.2-_ represented less 
than 4% of all Omicron sequences in 
the leading global virus database as of 
30 January, it has been identified in 
57 countries, with the earliest documented 
case dating to 17 November 
in South Africa. It likely now 
dominates in India, according 
to Bijaya Dhakal, a molecular 


but the Danish researchers also found that 
BA.2 may be even better at dodging vaccine- 
induced immunity: Vaccinated and boosted 
people were three times as susceptible to 
being infected with BA.2 as with BA.1. Vac- 
cinated but unboosted people were about 
2.5 times as susceptible, and unvaccinated 
people 2.2 times as susceptible. Early U.K. 
data, however, showed vaccinated people, 
if boosted, had about the same level of pro- 
tection against symptomatic infections with 
BA.1 or BA.2—63% and 70%, respectively. 

In one hopeful and unexpected finding 
from Denmark, those who were vaccinated 
or vaccinated and boosted passed on BA.2 
to household members less often, rela- 
tive to BA.1. The same didn’t hold for un- 
vaccinated people, who passed BA.2 to 
their household contacts at 2.6 times the 
rate they passed BA.1. 


Not so similar 
As this SARS-CoV-2 evolutionary tree 


lier variants of concern—Alpha, Beta, and 
Gamma-—are from each other (see graphic, 
below). Some even think BA.2 shouldn’t 
even be considered Omicron. “I hope in the 
near future that BA.2 gets its own variant 
of concern [label] because people assume 
it’s very similar which it’s not,’ Zeller says. 

BA.2 doesn’t have all of the mutations 
that help BA.1 avoid immune detection, 
but it has some its sibling doesn’t. Thomas 
Peacock, a virologist at Imperial College 
London, notes that most of the differences 
are in an area of the spike protein, called 
the N-terminal domain (NTD), that houses 
antibody targets. “What we don’t know is: 
Just because there are changes, are they 
changes that actually do something?” says 
Emma Hodcroft, a molecular epidemio- 
logist at the University of Bern. 

But one NTD difference—a deletion 
at amino acids 69 and 70 that 
is present in BA.1 and not in 
BA.2—could give researchers a 
tool for monitoring the spread 


biologist at the Sonic Reference 


suggests, the BA.1 and BA.2 strains of 


of the up-and-coming Omicron 
strain. Certain SARS-CoV-2 
polymerase chain reaction tests 


Laboratory in Austin, Texas, the Omicron variant are about as far apart Beta 

who examined sequence data genetically as some earlier variants are 

uploaded from eight large In- from one another. Gamma 
dian states. In the United King- 

dom, the proportion of likely 

BA.2 cases doubled from 2.2% 

to 4.4% in the 7 days that ended 

on 24 January. SS Alpha 


In the United States, the Cen- —_ 
ters for Disease Control and — 
Prevention is not yet tracking ge 
BA.2 separately. But Bedford es- 
timates it accounted for 7% of 
new U.S. cases as of 30 January, 
up from 0.7% on 19 January. “In 
each country and across time, 
we see that the epidemic growth 
rate of Omicron BA.2 is greater 
than Omicron BA.1,” he says. 

The report last week from the UK Health 
Security Agency (UKHSA) backs up that 
assessment in England, finding BA.2 was 
spreading faster than BA.1 in all regions 
where enough data were available to make 
an assessment. UKHSA data also show that 
in late December 2021 and early January, 
transmission was higher among household 
contacts of BA.2 cases, at 13.4%, than in 
contacts of other Omicron cases (10.3%). 

The study from Denmark, which se- 
quences the virus from virtually every 
person who gets COVID-19, paints a more 
dramatic picture. In households where the 
first case was BA.1, on average 29% of other 
people in the household became infected. 
When the first case was BA.2, 39% of house- 
hold members were infected. 

Omicron was already known to have 
mutations that help it evade antibodies, 
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Much as scientists a few weeks ago won- 
dered whether a previous infection with 
Delta or another variant would protect 
people from Omicron overall, some are 
now looking for data on whether Omi- 
cron’s first surge created a shield against 
BA.2. “To what extent does a BA.1 infection 
protect you against reinfection with BA.2?” 
Zeller asks. “From what I have seen in Den- 
mark, it’s not going to be 100%.” 

Scientists are also probing the variant’s 
ability to dodge vaccine-induced antibod- 
ies in lab dish studies. And drugmaker 
GlaxoSmithKline is testing its monoclonal 
antibody, sotrovimab, made with Vir Bio- 
technology, against BA.2 in lab studies. It’s 
the only widely authorized antibody that 
still thwarts BA.1. 

Scientists note BA.1 and BA.2 are about 
as far apart on the evolutionary tree as ear- 


detect three genetic sequences 
of the virus, but the mutation 
in BA.1’s NTD gene eliminates 
one of those targets. Polymerase 
chain reaction tests pick up all 
three targets in BA.2, providing 
a proxy for distinguishing the 
Omicron strains if there is no 
full virus sequence. 

How the sibling strains were 
born is also preoccupying scien- 
tists. Viral evolution in a single 
immunocompromised patient 
is one theory, says Andrew 
Rambaut, an_ evolutionary 
biologist at the University of 
Edinburgh. “It’s possible that 
long-term infection could produce quite a 
lot of diversity within a single individual. 
It could be compartmentalized. So dif- 
ferent variants living in different parts 
of the body.” Both Omicron strains could 
have also evolved in animals infected with 
human-adapted SARS-CoV-2, then spread 
back into people. 

Why BA.2 is emerging only now is one 
more mystery, Hodcroft says. She speculates 
that a factor as simple as which Omicron 
caught an earlier flight out of South Africa, 
where both strains were first identified, 
may be the explanation. “BA.2 may have 
just been trapped for a little bit longer. But 
when it did finally get out and start spread- 
ing it started to show that it can edge out its 
big sister.” 


With reporting by Kai Kupferschmicdt. 
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CONSERVATION BIOLOGY 


Massive wolf kill disrupts long- 
running Yellowstone park study 


Hunters kill more than 500 wolves in surrounding states 


By Virginia Morell 


unters are killing gray wolves in 

the northern Rocky Mountains in 
numbers not seen since the animals 

were nearly driven to extinction in 

the continental United States in the 

20th century. The recent killing of 

some 500 wolves in Montana, Idaho, and 
Wyoming—including nearly 20% of the 
wolves that use Yellowstone National 
Park—threatens to undermine a decades- 
old effort to restore the predators to the 
landscape and disrupt a long-term Yellow- 
stone research project that has produced 
influential findings on how wolves help 
shape ecosystems. Researchers and environ- 
mentalists are calling on officials to rethink 
the hunts, which have eliminated more 
than 15% of the wolves in the three states. 
The loss of the Yellowstone wolves 
“is a huge setback,” says biologist Doug 
Smith of the National Park Service, 
who leads the park’s wolf study, which 
began in 1995. “We had in Yellowstone 
one of the best models for understand- 


ing the behaviors and dynamics of a 25 


wolf population unexploited by hu- 
mans.” Now, he says, researchers will 
“do what we can to keep the science 
going—what we have left of it.’ 

For decades, wolves were strictly 
protected under the federal Endan- 
gered Species Act (ESA), but 10 years 
ago successful restoration efforts 
prompted federal officials to ease pro- 
tections and give state governments a 
greater say in managing the species. 
With wolf numbers in the northern 
Rockies reaching about 3100 in late 


Number of YNP wolves killed 
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2020, several states have legalized or ex- 
panded wolf hunts. Legislators in Montana, 
for example, last year set a goal of shrink- 
ing the state’s wolf population to “at least 
15 breeding pairs,” the minimum required by 
the ESA; state rules allow a person to kill up 
to 20 wolves each season. Idaho also aims to 
shrink its wolf population and has no kill lim- 
its. Wyoming has nearly achieved its goal of 
maintaining just 100 wolves and 10 breeding 
pairs outside of Yellowstone (where hunting is 
not allowed). 

Biologists say the killings won’t cause the 
regional extinction of wolves, although the 
US. Fish and Wildlife Service announced 
in fall of 2021 that it would review whether 
“potential increases in human-caused mor- 
tality” threaten the species. The losses will, 
however, alter the social structure of wolf 
packs—and reshape the Yellowstone study, 


Nowhere to hide 
Hunters have killed many more wolves that use Yellowstone National 
Park (YNP) during the 2021-22 hunting season than in past seasons. 


Wolves feed on a bison carcass in 
Yellowstone National Park. 


which has produced high-profile findings on 
how the return of wolves has affected willow, 
aspen, and cottonwood stands as well as elk, 
songbird, and scavenger populations. As of 
31 January, hunters had killed 24 of the 
roughly 125 wolves that use the park, includ- 
ing five that carried tracking collars placed 
by scientists. Hunters killed 19 in Montana 
outside the park’s northern borders, where 
officials had recently lifted quotas, and five in 
Idaho and Wyoming. 

Hunters have previously killed park wolves, 
up to seven each year from 2009 to 2020 in 
Montana. But the big new kill “complicates 
the research as we will now have to account 
for the confounding effects of hunting,” says 
ecologist Dan MacNulty of Utah State Univer- 
sity, who studies how wolves affect food webs. 

Smith believes it will take 4 or 5 years for 
the park’s packs to rebound from the losses, 
and is now planning to study how hunt- 
ing affects the wolves. Last week, he placed 
tracking collars on wolves in the north- 
ern part of the park, hoping to “compare 
the persistence and reproductive rates” of 
packs that have lost members with those 
that haven’t. Studies have found packs 
with more than eight members “are more 
resilient” to diseases such as mange, Smith 
notes, and can “have greater prey kill rates 
and are better at territorial defense.” 

Smith expects the park’s wolves, which 
are a favorite attraction of visitors, to be- 
come warier and more difficult to see. 
Yellowstone superintendent Cam _ Sholly 
has said he wants “to make the case” to 
Montana officials “for reinstating quotas 
that would protect the [park’s] core wolf 
population.” But state leaders have shown 
little interest, and Montana’s wildlife com- 
mission recently declined to end hunting 
in certain areas near the park. “We 
don’t manage for individual wolves 
or packs. We manage wolves across 
landscape and population scales,” 
says Greg Lemon, a spokesperson for 
Montana’s wildlife department. 

Wolf advocates fear such stances 
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will mean fewer wolves available to 
start new packs elsewhere. (Still, one 


Montana 


nearby state, Colorado, is advancing 
a plan to restore the canids.) And 


15 


10 


they argue states are pursuing con- 
tradictory policies. Montana and Wy- 
oming, for example, want to reduce 


elk populations that officials say have 
grown too large. But they’ve also em- 
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braced killing a predator that could 
help them reach that goal. “Their 
management objectives,’ MacNulty 
says, “are at cross purposes.” 
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MICROBIOLOGY 


Computer scan uncovers 100,000 new viruses 


Clues to future outbreaks may be hidden in existing genomic databases 


By Elizabeth Pennisi 


t took just one virus to cripple the 

world’s economy and kill millions of 

people; yet virologists estimate that 

trillions of still-unknown viruses exist, 

many of which might be lethal or have 

the potential to spark the next pandemic. 
Now, they have a new—and very long—list of 
possible suspects to interrogate. By sifting 
through unprecedented amounts of exist- 
ing genomic data, scientists have uncovered 
more than 100,000 novel viruses, including 
nine coronaviruses and more than 300 re- 
lated to the hepatitis Delta virus, which can 
cause liver failure. 

“It’s a foundational piece of work,’ 
says J. Rodney Brister, a bioinformati- 
cian at the National Library of Medi- 
cine. The study, published last week in 
Nature, expands the number of known 
viruses that use RNA instead of DNA 
for their genes by an order of magni- 
tude. It “demonstrates our outrageous 
lack of knowledge about this group of 
organisms,” says disease ecologist Peter 
Daszak, president of the EcoHealth Al- 
liance, a nonprofit research group in 
New York City that is raising money to 
launch a global survey of viruses. 

Scientists predict the study will 
also help launch so-called petabyte 
genomics—the analyses of previously 
unfathomable quantities of DNA and 
RNA data. (One petabyte is 10” bytes.) 
That wasn’t exactly what computa- 
tional biologist Artem Babaian had 
in mind when he came up with the proj- 
ect while in between jobs in early 2020. 
Instead, he was simply curious about how 
many coronaviruses—aside from the virus 
that had just launched the COVID-19 pan- 
demic—could be found in sequences in ex- 
isting genomic databases. 

So, he and independent supercomput- 
ing expert Jeff Taylor scoured cloud-based 
genomic data that had been deposited to a 
global sequence database and uploaded by 
the U.S. National Institutes of Health. As of 
now, the database contains 16 petabytes of 
archived sequences, which come from ge- 
netic surveys of everything from fugu fish, 
the risky Japanese delicacy, to farm soils to 
human guts. (A database with a 5-megabase 
digital photo of every person in the United 
States would take up about the same amount 
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of space.) The sequences also capture the 
genomes of viruses infecting different or- 
ganisms in samples, but the viruses usually 
go undetected. 

To sift through the reams of data, Babaian 
and Taylor devised a set of computer search 
tools specialized for cloud-based data. With 
the help of several bioinformaticians, some 
whom became collaborators on the project, 
they tweaked the new software to make 
their analysis “way faster than anyone 
thought possible,” recalls Babaian, who is 
now at the University of Cambridge. 

They soon expanded the viral hunt be- 
yond coronaviruses and looked at all the 


In a vast repository of genetic sequences, scientists found nine 
unknown coronaviruses, relatives of SARS-CoV-2 (computer model). 


data in the cloud. Babaian and his col- 
leagues’ programs hunted among _ the 
cloud’s sequences for matches to the central 
core of the gene for RNA-dependent RNA 
polymerase, which is key to the replication 
of all RNA viruses. Such viruses include 
not only coronaviruses, but also those that 
cause flu, polio, measles, and hepatitis. 

Babaian’s approach was fast enough to 
work through 1 million data sets a day— 
at a computing cost of less than 1 cent per 
data set. “It’s an impressive engineering 
feat,” says C. Titus Brown, a bioinformati- 
cian at the University of California, Davis. 
When the researchers were finally finished, 
they had uncovered the partial genomes of 
almost 132,000 RNA viruses. 

The group’s new database doesn’t have 
the complete sequence of each new virus— 


in many cases, there’s just the gene for the 
core enzyme. But researchers can use even 
partial sequences to build family trees that 
reveal how different viruses are related. In 
some cases, they can also use the database to 
find out where around the world a particu- 
lar virus was found—and what type of host 
it was in. And some of the discovered viruses 
could help researchers better understand 
how human pathogens arise, Brown says, 
or improve diagnostic tests for infections. 
Finally, when a new virus is isolated from 
a sick patient, a scan of the genomic da- 
tabase could show whether it was already 
present elsewhere. “We have turned this 
[database] into a giant virus surveil- 
lance network,” Babaian says. 

Some findings were unexpected, 
including new coronaviruses in the 
well-studied fugu fish and in the 
axolotl, an amphibian that is a com- 
mon lab organism. In a few cases, 
researchers could piece together 
whole genomes for the viral finds. 
And in some aquatic animals, those 
sequences suggested their novel 
coronavirus genomes are spread 
across two separate RNA molecules, 
not the usual single strand, Babaian 
and his colleagues report. 

Babaian’s team also came across 
evidence of more than 250 giant 
bacteriophages—viruses that infect 
bacteria—that resemble ones _al- 
ready known in algae. These “huge 
phages” were detected in sequences 
from vastly different organisms. One 
group of huge phages was found in a per- 
son in Bangladesh and also in cats and 
dogs in the United Kingdom, for example. 
These viruses are big enough to carry genes 
between different hosts species, suggesting 
they might provide a new source of genetic 
changes, Babaian notes. That’s the way it is 
with viruses, Daszak says. “Every time we 
start digging, we get surprises.” 

To make sure others can take advantage of 
the work, Babaian’s team has created a pub- 
lic repository of the tools it developed, along 
with the results. The amount of cloud-based, 
publicly available DNA sequences is expand- 
ing exponentially; if he did the same analy- 
sis next year, Babaian expects he would find 
hundreds of thousands more RNA viruses. 
“By the end of decade, I want to identify over 
100 million.” & 
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FAILING THE TEST 


DNA barcoding brought botanist Steven Newmaster 
scientific fame and entrepreneurial success. Was it all based on fraud? 


By Charles Piller 


Almost overnight, a 2013 paper made Steven Newmaster an expert on the verification of food and supplements. 


484 4 FEBRUARY 2022 + VOL 375 ISSUE 6580 science.org SCIENCE 


PHOTO: ANDREW FRANCIS WALLACE/TORONTO STAR VIA GETTY IMAGES 


Corrected 3 February 2022. See full text. 


n 2013, a team led by Steven Newmaster, 
a botanist at the University of Guelph 
(UG), took a hard look at popular herbal 
products such as echinacea, ginkgo 
biloba, and St. John’s wort. The team 
published a study that used DNA 
barcoding—a system to identify species 
using small, unique snippets of genetic 
material—to test whether the bottles re- 
ally contained what was printed on the label. 

The results were troubling. Most of the 
tested products contained different plants, 
were larded with inert fillers, or were tainted 
with contaminants that could cause liver and 
colon damage, skin tumors, and other seri- 
ous health problems. The paper, published in 
BMC Medicine, received prominent attention 
from The New York Times, CBC, and many 
other media outlets. The findings “pissed me 
off; Newmaster told PBS’s Frontline. “I go in 
to buy a product that I believe in, that I care 
about and I pay a lot of money for, and it’s 
not even in the bottle? Are you kidding me?” 

His work inspired then-New York Attor- 
ney General Eric Schneiderman to sponsor 
a similar study conducted by James Schulte, 
then at Clarkson University, who confirmed 
that consumers were often misled. At 
Schneiderman’s request, major retailers such 
as GNC, Walgreens, and Walmart pledged 
to pull suspect products from the shelves or 
take other measures. 

Almost overnight, Newmaster became an 
authority on the verification of food and sup- 
plement ingredients. He quickly went from in- 
dustry adversary to ally, as major supplement- 
makers hired companies he created to cer- 
tify their products as authentic. In 2017, 
Newmaster also founded the Natural Health 
Products Research Alliance (NHPRA), a ven- 
ture within UG that aims to improve certifi- 
cation technologies for supplements. It raised 
millions of dollars from herbal suppliers, 
boosting UG’s finances and prestige. 

But in an ironic twist, eight experts in DNA 
barcoding and related fields now charge that 
the 2013 paper that indicted an entire indus- 
try and launched a new phase in Newmaster’s 
career is itself a fraud. In a 43-page allega- 
tion letter, sent to UG in June 2021 and ob- 
tained by Science, the researchers—from UG, 
the University of Toronto, the University of 
British Columbia, and Stanford University— 
cited major problems in the study and two 
others by Newmaster and collaborators. 
“The data which underpin [the papers] are 
missing, fraudulent, or plagiarized,’ the let- 
ter flatly stated. The group also charged that 
Newmaster “recurrently failed to disclose 
competing financial interests” in his papers. 

The accusers include co-authors of two of 
the suspect papers, who now say they believe 
Newmaster misled them. “I felt that trust 
was betrayed,” says one of them, John Fryxell, 
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executive director of the Biodiversity Insti- 
tute of Ontario. One paper, which compared 
the cost of DNA barcoding with traditional 
methods for cataloging forest biodiversity, 
was retracted last fall at the request of its 
junior author, Ken Thompson, now a Stan- 
ford postdoctoral fellow. The letter was also 
signed by evolutionary biologist Paul Hebert, 
sometimes called the “father of DNA barcod- 
ing,” who directs UG’s Centre for Biodiversity 
Genomics (CBG). 

Newmaster did not respond to interview 
requests or written questions. But in a de- 
fense he sent to UG—which Science has also 
obtained—he denied all charges. “I have 
never committed data fabrication, falsifica- 
tion, plagiarism, or inadequate acknowl- 
edgment in the publications as claimed,’ 
Newmaster wrote. “I have never engaged 
in any unethical activity or academic mis- 
conduct.” He also said he had never made 
money from his network of businesses. 

An investigation by Science found the prob- 
lems in Newmaster’s work go well beyond the 
three papers. They include apparent fabrica- 
tion, data manipulation, and plagiarism in 
speeches, teaching, biographies, and schol- 


“| have never 
engaged in any unethical activity 
or academic misconduct.” 


Steven Newmaster, University of Guelph 


arly writing. A review of thousands of pages 
of Newmaster’s published papers, conference 
speeches, slide decks, and training and pro- 
motional videos, along with interviews with 
two dozen current and former colleagues 
or independent scientists and 16 regulatory 
or research agencies, revealed a charismatic 
and eloquent scientist who often exagger- 
ated, fabulized his accomplishments, and 
presented other researchers’ data as his own. 

UG, which has been investigating the al- 
legations since August 2021, declined to an- 
swer questions about its own investigation or 
Science’s findings, citing confidentiality rules. 
Other UG scientists say university adminis- 
trators repeatedly pressured them to stop 
questioning Newmaster’s research. UG also 
dismissed a detailed request for an investiga- 
tion made by Thompson in 2020. Some now 
fear university administrators will quash 
the new accusations in a misguided attempt 
to protect UG’s and their own reputations, 
and the university’s share of funds raised by 
Newmaster. UG declined to comment on 
those concerns, as well. 

“The 2013 herbal supplement paper re- 
flects a pattern of deception and academic 


misconduct. The university has chosen to 
stand back for reasons that I don’t under- 
stand,’ Hebert says. “I am disturbed to sit in 
a building where someone has been running 
a fabrication mill.” 


ON SOCIAL MEDIA, Newmaster described 
himself as a scientific “explorer” and “ad- 
venturer.” His Instagram page showed him 
skiing double black diamond runs, riding 
dog sleds, and inspecting tea fields in China. 
(Newmaster’s Instagram account became 
private after Science contacted him.) 

According to his CV and LinkedIn page, 
Newmaster joined UG’s faculty in 2001 or 
2002, after earning a Ph.D. in environmen- 
tal biology and ecology at the University of 
Alberta, and became curator of an herbar- 
ium housed at UG. His intrepid character, 
personal appeal, and ability to put people 
at ease charmed colleagues. Environmental 
physiologist Patricia Wright, retired from 
UG, describes him as “an upbeat, fun guy 
that students really liked.” 

Not long after Newmaster arrived, semi- 
nal work by Hebert and others helped 
launch DNA barcoding as an important re- 
search tool with diverse applications such 
as cataloging biodiversity and monitoring 
water quality. Hebert raised funds to build 
a small barcoding empire at UG, with scores 
of researchers and two buildings, one of 
which became home to the herbarium and 
Newmaster’s personal lab. Hebert also co- 
founded and serves as scientific director for 
the Barcode of Life Data System (BOLD), 
a repository with millions of barcodes for 
more than 300,000 named species. 

Newmaster embraced the technology. He 
has used it not only to authenticate medici- 
nal plants, but also to study plant diversity 
in Canada and India and catalog threatened 
tree species. Much of the DNA work was 
carried out by Subramanyam Ragupathy, a 
botanist in Newmaster’s lab who did not re- 
spond to requests for an interview. 

In the 2013 supplement paper, Newmaster, 
Ragupathy, and collaborators describe how 
they derived DNA barcodes for 44 popular 
herbal products and compared them with 
barcodes from validated sources. The ex- 
plosive results—most of the products had 
DNA from herbs not on the label, and many 
contained plants with “known toxicity’— 
alarmed experts. “This suggests that the 
problems are widespread and that qual- 
ity control for many companies, whether 
through ignorance, incompetence or dis- 
honesty, is unacceptable,’ nutritionist David 
Schardt, then with the Center for Science in 
the Public Interest, told The New York Times. 

The paper drew criticism as well. A sting- 
ing 2013 analysis in HerbalEGram—a jour- 
nal of the American Botanical Council, a 
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nonprofit research group—claimed many 
egregious errors and called for a retraction. 
The analysis accused Newmaster of not un- 
derstanding that supplements use benign 
inactive substances such as rice powder as 
“carriers,” and that DNA can be destroyed 
during processing without altering a supple- 
ment’s effects. 

Newmaster and his co-authors offered a 
solution to the problem they had identified. 
“We suggest that the herbal industry should 
voluntarily embrace DNA barcoding,’ they 
wrote in the paper, to give companies “a com- 
petitive advantage as they could advertise 
that they produce an authentic, high quality 
product.” CBG scientist Masha Kuzmina, who 
cosigned the allegations against Newmaster, 
says the message was: “The paper is out. It’s 
[a] scandal. Now there is a problem and it 
needs to be solved. And who’s solving it? The 
same person” who exposed the problem. 

Although the paper claimed “no compet- 
ing interests,’ Newmaster and UG geneticist 
Robert Hanner in 2012 had created Biological 
ID Technologies Inc., which conducted DNA 
barcoding for foods and herbal products and 
offered purity certifications for product la- 
bels. On 11 July 2013, about 1 week after the 
paper was submitted, Newmaster and Hanner 
incorporated a second company, named Tru- 
ID, which apparently assumed the business 
initiated by Biological ID Technologies. (Tru- 
ID folded in 2020, under “financial hardship 
during the pandemic,” Newmaster said in 
his response to the misconduct complaint. 
Hanner would not provide any comment for 
this article.) 

When the New York attorney general’s 
probe triggered by Newmaster’s paper pres- 
sured companies to validate their ingredi- 
ents, Tru-ID was ready to help, says Stefan 
Gafner, chief science officer at the Ameri- 
can Botanical Council and co-author of the 
HerbalEGram critique. At least three major 
supplementmakers, Nature’s Way, Herbalife 
Nutrition, and Jamieson, hired Tru-ID and 
adopted its certifications. (The company 
also received more than $369,000 in con- 
tributions and contracts from the Canadian 
government.) “The whole way [Newmaster] 
would talk about DNA was really a marketing 
pitch for the industry. And eventually, he got 
a lot of success,” Gafner says. 

In the years after the paper was published, 
Newmaster acknowledged that critics had 
been partially correct. His methods could not 
accurately measure the components of herbal 
remedies, largely because DNA barcoding 
cannot distinguish varying amounts of dif- 
ferent substances in a mixed sample, and be- 
cause DNA degrades during processing. 

NHPRA, the UG-based alliance 
Newmaster launched in 2017, aimed to im- 
prove practices in the nascent field, in part 
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by combining DNA barcoding with other 
approaches. The website of UG’s office of 
alumni affairs and development says the 
university is “raising $20 million to create 
new verification standards and develop 
new technology” through NHPRA and of- 
fers sponsorship levels from $25,000 to 
$1 million. Several big industry players have 
joined; Tru-ID also committed $500,000. 
UG repeatedly touted Newmaster’s work 
in press releases and pushed back when 
that work was challenged. In 2017, Jonathan 
Newman, then-dean of the College of Bio- 
logical Sciences, called UG scientists Evgeny 
Zakharov and Natalia Ivanova into his of- 
fice for what Zakharov sarcastically calls a 


“The university has chosen 
to stand back for 
reasons that | don’t understand.” 


Paul Hebert, University of Guelph 


“friendly discussion.” The two scientists had 
indirectly questioned Newmaster’s work at a 
conference, noting that DNA barcoding alone 
can’t always reliably identify ingredients in 
herbal products. Newman admonished them 
to avoid comments that might sour NHPRA 
contributors, says Zakharov, who is lab direc- 
tor for the Canadian Centre for DNA Barcod- 
ing. “I said to Newman, ‘Are you sure you 
are backing the right horse?” Zakharov says. 
“Newman’s response was: ‘You're not the one 
who brought me a $1 million deal.” 

Ivanova, who’s now at a Guelph biomoni- 
toring company, confirms the conversation 
and says Newman contacted her again later 
that year for a similar talk, also attended by 
Glen Van Der Kraak, who became interim 
dean in 2019. “I felt that I could not say no” 
to the requests, Ivanova says. The encounter 
gave her “the feeling that every step is being 


watched for any critique towards technolo- 
gies used by Newmaster’s lab.” 

Newman, now vice president for research 
at Wilfrid Laurier University, says he con- 
nected Newmaster with Herbalife and sup- 
ported his fundraising. He says he did not tell 
the duo what to say in public but asked them 
not to solicit companies Newmaster was 
courting. (Zakharov and Ivanova say they 
had never engaged in fundraising.) Van Der 
Kraak declined to comment. 

More commercial ventures followed, in 
a network that is hard to disentangle. In 
2019 or 2020, Newmaster became a science 
adviser to Purity-IQ, a startup that, like 
Tru-ID, aims to certify the ingredients of 
foods, herbs, wine, cannabis, and other co- 
mestibles. According to Purity-IQ’s website, 
NHPRA performs lab tests for the company, 
which pledged $1 million to NHPRA in 2021. 

After the COVID-19 pandemic broke 
out, Newmaster cofounded ParticleOne, 
which sells software to assess indoor air for 
SARS-CoV-2. He is an adviser to Songbird 
Life Science, which offers COVID-19 tests 
and shares technology and executives with 
ParticleOne and Purity-IQ. (All three com- 
panies declined to comment, except to say 
concerns about Newmaster did not involve 
their own work, and in Purity-IQ’s case, that 
it stands by its tests.) 

Newmaster developed close ties with one 
sponsor, Herbalife, despite its checkered his- 
tory. Herbalife paid a $200 million fine in 
2016 to settle allegations by the U.S. Federal 
Trade Commission that it was operating a 
sophisticated pyramid scheme, and another 
$123 million in 2020 to settle federal charges 
that it engaged in bribery and other corrupt 
acts in China. Newmaster has touted Herb- 
alife’s products in promotional materials, 
effusively praised its cultivation practices 
after a 2018 visit to a Chinese tea farm, and 
lauded its efforts “to achieve excellence.” He 
also came to the company’s defense in 2019, 
when Indian researchers published a paper 
in the Journal of Clinical and Experimental 
Hepatology about a woman who died from 
liver failure, which the researchers associated 
with her use of Herbalife dieting products. 
In a letter to the editor, Newmaster—who 
has no medical background—castigated the 
paper. (Elsevier, the publisher, removed the 
paper from its website in 2020 after legal 
threats from Herbalife.) 

Web pages featuring Newmaster disap- 
peared from Herbalife’s website last month, 
after Science contacted the company. Herb- 
alife would not provide any comment for 
this story. 


EVEN AS NEWMASTER’S star was rising, 
some of his colleagues complained that he 


made exaggerated claims. Newmaster never 
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worked for CBG, but Thomas Braukmann, 
a former postdoc at the center who’s now 
at Stanford, says he saw him host tours of 
CBG’s handsome atrium and sequencing labs 
as if he ran the facility. “Those two buildings 
are my buildings,” Newmaster said in a 2019 
keynote speech at the CBD Expo, an industry 
conference on cannabis in Orlando, Florida, 
referencing the CBG complex. “I have 80 sci- 
entists working for me.” That appears to be 
a reference to CBQ@’s staff, who actually work 
under Hebert. 

In his biography on UG’s_ website, 
Newmaster noted a postdoctoral fellowship 
in “multidimensional matrix mathemat- 
ics and multivariate analysis” at Australia’s 
Commonwealth Scientific and Industrial 
Research Organisation, which says it has 
no record of Newmaster. (The claim was 
removed after Science asked Newmaster 
about it in January.) His CV listed a presti- 
gious Discovery grant from the Natural Sci- 
ences and Engineering Research Council of 
Canada (NSERC) for $198,000 over 5 years. 
NSERC says the grant was $11,500 for 1 year. 
He claimed a separate NSERC award for 
$240,000, but it was only worth $40,000. 

On its website, NHPRA listed many “stra- 
tegic partners,” including the U.S. Food and 
Drug Administration, U.S. Pharmacopeia, 
the Canadian Food Inspection Agency, the 
Canadian National Research Council, and 
the American Botanical Council. None has 
any defined relationship with NHPRA, 
they told Science. (In December 2021, after 
Science contacted the groups, NHPRA’s web- 
site was replaced with a notice that it would 
be back in 2022.) In his 2019 cannabis speech, 
Newmaster also claimed links with U.S. 
regulators and standards boards that those 
groups say don’t exist. 

In one particularly odd boast during an 
October 2020 radio interview, Newmaster 
said he was working on SARS-CoV-2 tests, 
in part at the request of the U.S. Centers for 
Disease Control and Prevention (CDC), in the 
summer and fall of 2019, months before the 
COVID-19 pandemic erupted. “In the scien- 
tific community we were already sequencing 
samples, blood samples, saliva samples, and 
looking at this virus,’ he told an incredulous 
host. A CDC spokesperson could not locate 
information about working with Newmaster. 

His colleagues complained of other kinds 
of dishonesty, too. In 2010, several UG scien- 
tists say, a student reported that Newmaster 
had taken large portions of his course ma- 
terials from internet sites. “I was absolutely 
floored,” says Wright, who co-taught that 
course with him. Science obtained a sample 
of the documents and verified substantial 
copying and pasting from Wikipedia and 
elsewhere. When Wright confronted him, 
Newmaster seemed unperturbed, she says: “I 


SCIENCE science.org 


Uncanny resemblance 

During a 2020 online training for the Association of Food and Drug Officials about 
cannabis cultivar identification and purity verification, Steven Newmaster presented data 
from other researchers—and even from completely different fields—as his own. 


“Unique signatures” 


In his talk, Newmaster 
described how “unique 
signatures” helped 

his team identify 
coffee cultivars from 
Guatemala, Colombia, 
Tanzania, and Brazil. 
But the image he 
showed was identical 
to one in a paper about 
coffee identification 
published in the 
Journal of Agricultural 
and Food Chemistry by 
a Japanese research 
group in 2012. 
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Bad chemistry 
This graphic, which 
ewmaster said 
showed data from 
his work on cannabis 
identification, is 
identical—including 
the added numbers 
and text along 

the axes—to one 

in a paper about 
identifying ginseng 
types, published 

in Analytical and 
Bioanalytical 
Chemistry by a 
different research 
group in 2012. 


PC 2 Loadings 


8 7 6 5 4 3 2 1 
Chemical Shift(ppm) 


Into the weeds 
Newmaster said this 
slide showed nuclear 
magnetic resonance 
profiles for three 
cannabis strains; he 
added photos of 

each (circled here). 
But the graphic is 
identical to one showing 
arrest data for 50 US. 
states that appears 

on the Comprehensive 
R Archive Network, 

a support site for 

the programming 
language R. 
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lost hope in him as a scientist at that time.” 
UG quietly required Newmaster to fix the 
material, Wright and others say. Science also 
found plagiarized sections in several of his 
published papers, including one on millet 
identification in Southeast India (see graphic, 
p. 489). Jose Maloles, the paper’s first author, 
says it was based on his undergraduate the- 
sis, but could not recall how it was drafted. 

In a 2020 promotional video made by 
Purity-IQ, Newmaster warned about the 
risks of data manipulation. “We could have 
all the testing in the world,” he said, “and if 
that data could be counterfeited or could be 
changed in any way it doesn’t really matter 
how good the test is.” One month later, in a 
training video for the Association of Food 
and Drug Officials in which he promoted 
Purity-IQ testing, Newmaster displayed 
graphics from other sources without credit 
and described them as his own work, an 
analysis of his talk and PowerPoint slides 
shows (see graphic, p. 487). 

“Here's the little experiment that we ran,’ 
Newmaster says in the video, calling it “a real 
life scenario” to guide industry quality con- 
trol. But the image he showed, purportedly 
representing an analysis of cannabis strains, 
is identical to one assembled by other re- 
searchers that depicts U.S. arrest data. 

Independent scientists identified more 
serious problems in Newmaster’s work, 
such as an analysis of sarsaparilla—a tropi- 
cal plant used to treat joint pain—published 
in 2020 with other NHPRA researchers. 
Stanford’s Braukmann and Damon Little, a 
bioinformatics expert at the New York Bo- 
tanical Garden, both examined the genetic 
sequences Newmaster provided, and found 
those labeled Indian sarsaparilla were actu- 
ally near-exact matches for Escherichia colli, 
a common experimental bacterium. Prasad 
Kesanakurti, corresponding author for the 
paper, says the data merely reflected common 
FE. coli contamination, and offered to provide 
the assembled plant sequences for review. 
Braukmann says only an examination of the 
raw data could clarify what went wrong. The 
paper is “an example of poorly done science,” 
he says. “It makes me not trust anything that 
comes out of [NHPRA].” 


THE INQUIRY NOW UNDERWAY at UG was trig- 
gered by Thompson, who in 2012 was one 
of the first two students to enroll when 
Newmaster helped launch UGQ@’s_ under- 
graduate biodiversity major. Newmaster 
asked Thompson to work on a paper com- 
paring the cost of traditional taxonomic 
typing and DNA barcoding for identifying 
forest plants. Newmaster provided the sum- 
mary data; Thompson had to analyze them 
and draft the paper. “We’re getting one-on- 
one time with this famous, supersuccess- 
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ful, important professor,’ Thompson recalls 
thinking. The resulting 2014 paper in Bio- 
diversity and Conservation was his first. 

Years later, Thompson grew queasy. 
He realized the perfect species identifica- 
tion claimed in the paper was virtually 
impossible for some of the plants. And 
Newmaster had never shown him the raw 
data or uploaded it to BOLD or GenBank, 
the standard sequence repository. In early 
2020, Thompson asked UG to investigate. 
“T wasn’t 100% confident that it was fraud- 
ulent,’ he says. “I was 100% confident that 
it was worth asking the question.” 

In September and October 2020, in re- 
sponse to Thompson’s inquiry, Newmaster’s 


“They thought that... 
| didn’t have a lot of power— 
that they could squash me.” 


Ken Thompson, Stanford University 


collaborator Ragupathy deposited thou- 
sands of sequence records, purportedly 
obtained for the forest paper, in GenBank. 
(Around the same time, he uploaded 126 
records for the 2013 supplements paper.) 
Thompson also examined the specimen site 
data and found that 80% precisely matched 
data collected earlier for another student’s 
thesis, at a different site hundreds of kilo- 
meters away. 

Thompson—who later also detected some 
cases of Newmaster’s apparent image fab- 
rication or plagiarism—says UG admin- 
istrators slow-walked his request for an 
investigation, recast it as an informal query, 
and in early 2021 rejected his claims as in- 
sufficiently supported. “They thought that 
I was just one person, and I didn’t have a 
lot of power—that they could squash me,” 
he says. He then asked the editor of Bio- 
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diversity and Conservation to conduct his 
own review. But the editor deferred to UG. 

In May 2021, Thompson self-published 
his concerns and posted a related com- 
mentary on a popular biodiversity blog, 
Eco-Evo Evo-Eco. “Doing this alone behind 
the scenes has been incredibly isolating,” 
he wrote. “I ... hope that by sharing an 
evidence-based critique of our paper some 
people will choose to support me.” Indeed, 
Hebert soon added a note of support. 

Hebert says Thompson’s move re- 
vived his own long-running doubts about 
Newmaster’s work. He reached out to six 
other scholars who could offer authoritative 
assessments. They reexamined the forest 
paper and also scrutinized the supplements 
article and a third paper, published in 2013 
in the Canadian Journal of Forest Research, 
which found that DNA barcoding of fecal 
matter from woodland caribou worked bet- 
ter than conventional methods to determine 
the animals’ diets. In June 2021, the eight 
requested the misconduct investigation by 
UG. More recently, some of them also asked 
the publishers to retract the supplement 
and caribou papers. Hebert says a request 
to retract a fourth paper is in preparation. 

The allegation letter details the problems 
Thompson and Kuzmina detected and many 
others. It notes that the papers say barcoding 
for both the forest and supplement papers 
was carried out by the Canadian Centre for 
DNA Barcoding, also led by Hebert, but that 
the center has no record of that work. The 
letter adds that no sequences were depos- 
ited in BOLD or GenBank before publication 
of either paper, and that some of the data 
Ragupathy belatedly uploaded in 2020 con- 
tradicts the papers’ claims. For example, in 
the supplements paper, Newmaster’s group 
labeled a product as the laxative Senna alex- 
andrina, but the sequence came from another 
legume. Moreover, some of the sequences con- 
tained errors that precisely matched those in 
sequences previously submitted by other re- 
searchers for several other studies. 

In his response to the allegation letter, 
obtained by Science, Newmaster strenuously 
disputed the concerns. The close correspon- 
dence Kuzmina found with samples taken 
elsewhere reflected normal species similar- 
ity in the forest ecosystems, he said. New- 
master insisted his samples were correctly 
identified, and that innocent technical errors 
could account for matches between rare or 
unique mistakes in his sequences and ones 
published by other researchers. 

Contradicting the papers, he said much 
of the barcoding was done not at the Cana- 
dian Centre for DNA Barcoding, but at an- 
other UG lab, the Advanced Analysis Centre 
(AAC) Genomics Facility, or in Newmaster’s 
personal “artisanal genomics lab.” Yet he 
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Borrowed words 


Science found several instances of apparent plagiarism in Steven Newmaster's work. For example, a 2011 paper about millets on which he was the last author 
contained text from two earlier papers (highlighted here), including references (bold) that Newmaster and his co-authors failed to add to their own reference list. 


Text in a 2011 paper by Steven Newmaster and colleagues 


maintaimbody:temperature and energy levels afterdelivery: 


Maloles et al., “The Fine Scale Ethnotaxa Classification of Millets in Southern 


India,” Journal of Ethnobiology, 2011 


Text in original papers 


consumption and for festivals. Their use as a special food in. 
prenatal care of women and their fodder quality are important 


Even nowadays 


L OD 
+ 
wn 


R. Rengalakshmi, “Folk biological classification of minor millet species in 


Kolli Hills, India,” Journal of Ethnobiology, 2005 


Our research was conducted with the Malayali in the Kolli Hills; 
which lie in Tamil Nadu's Talaghat Plains (Bohle 1992), one 


Maloles et al., “The Fine Scale Ethnotaxa Classification of Millets in Southern 


India,” Journal of Ethnobiology, 2011 


conceded he could not locate the sequencing 
records. As to why the sequences were not 
made public at the time, Newmaster says 
he submitted them to BOLD but blamed its 
staff for mishandling them. (Hebert, BOLD’s 
scientific director, says records show 
Newmaster never submitted the data, and 
even if he had, BOLD’s published policy re- 
quires a study’s project manager to ensure 
the data go to GenBank as well.) 

Newmaster also rejected allegations that 
he concealed business interests in his pa- 
pers. “[T]he only income I have had during 
my tenure at the University of Guelph is my 
University salary,’ he wrote. Science filed 
a request with UG for Newmaster’s out- 
side income declarations, including from 
his own companies; UG Vice President for 
Research Malcolm Campbell responded 
that the records are exempt from disclo- 
sure. Purity-IQ, Songbird, and ParticleOne 
declined to comment about Newmaster’s 
compensation. 

Science asked Little, from the New York 
Botanical Garden, to review the allegation 
letter, Newmaster’s response, and numer- 
ous related documents and provide an inde- 
pendent perspective on the case. Little calls 
the large number of precisely replicated er- 
rors in DNA sequences “bizarre” and sug- 
gestive of data manipulation. “People will 
get hit by cars,” he says. “But will two of 
them be hit by cars while walking across the 
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Elizabeth Finnis, “The political ecology of dietary transitions: 
Changing production and consumption patterns in the Kolli Hills, India,” 


Agriculture and Human Values, 2007 


same intersection on their hands at 4 a.m.?” 
Newmaster’s claim that forest ecology 
could explain the 80% match between the 
data in the forest paper and those in the 
graduate student thesis was “unbelievably 
wrong,” Little adds. And the claim that 
Newmaster and AAC both lost the same 
sequencing records is implausible, he says, 
given how zealously scientists and service 
providers normally safeguard such data. 
Overall, Little calls the allegations against 
Newmaster credible. “The papers are at 
best inaccurate and at worst fraudulent,” 
he says. “The end result is the same: They 
should be retracted and not trusted.” 


IN OCTOBER 2021—5 months after Thomp- 
son had gone public with his concerns— 
Biodiversity and Conservation reconsid- 
ered and agreed to retract the forest paper. 
GenBank has removed the DNA sequences 
purportedly associated with the paper. The 
UG inquiry into the three papers is ongo- 
ing. At UG’s request, Canada’s Secretariat 
on Responsible Conduct of Research ex- 
tended the deadline for a decision until 
June. Newman, the former dean, says he 
hasn’t seen the allegations, but “if Steve 
actually fabricated data for a publication 
... | would just expect that’s career death.” 
Yet, the composition of the investigative 
committee makes Hebert and other critics 
worry UG will again dismiss the allega- 


Published by AAAS 


tions against Newmaster. University rules 
require that such committees comprise the 
dean of the College of Biological Sciences, 
the associate vice president for research, 
and a representative from outside UG. But 
the final committee consists of a business 
professor, the dean of UG’s veterinary col- 
lege, and a psychologist from a nearby 
university—none with a background in the 
relevant science. (UG allows an accused sci- 
entist to challenge the panel’s membership 
if they suspect bias; it’s not clear whether 
Newmaster did so.) In an email to Science, a 
UG spokesperson wrote that the investiga- 
tion is using “a fair and standard process” 
and the university will “take appropriate ac- 
tion based on the results.” 

Given Newmaster’s high profile—and the 
way the university has handled the case so 
far—UG cannot be trusted to carry out an 
even-handed probe, Thompson says. He de- 
scribes his treatment by UG after he tried to 
get his paper investigated as “gaslighting’”— 
being provided with a false impression that 
his concerns were taken seriously. “We need 
an independent body [from outside UG] 
to review cases like this,” Thompson says. 
“Tt’s the only solution to stop history from 
repeating itself.” 


With reporting by Meagan Weiland and 
Jenny Carpenter. This story was supported by 
the Science Fund for Investigative Reporting. 
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Chasing after methane’s ultra-emitters 


Leaks from oil and gas companies contribute substantially to global warming 


By Felix Vogel 


he latest emissions gap report from 
the United Nations Environment 
Programme highlighted that current 
and planned mitigation measures are 
insufficient to achieve the goal of the 
Paris Agreement of limiting global 
warming to 15°C above preindustrial tem- 
peratures (7). When national representatives 
gathered at the 26th UN Climate Change 
Conference of the Parties (COP26) in Glasgow 
last November, a plan to rapidly decrease 
methane emissions emerged, with over 100 
countries joining the Global Methane Pledge 
aimed at reducing global methane emissions 
by at least 30% from 2020 amounts by 2030. 
While policy-makers try to enact climate- 
related legislation, scientists are trying to 
identify more cost-effective or quick ways 
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to curb greenhouse gas emissions. On page 
557 of this issue, Lauvaux et al. (2) highlight 
how methane emissions can be effectively re- 
duced by targeting ultra-emitter sites identi- 
fied by the European Space Agency’s satellite- 
based TROPOspheric Monitoring Instrument 
(TROPOMD) (3). 

Methane is a gas with a strong climate im- 
pact but short atmospheric lifetime of about 
9 years. Mitigation of methane emissions can 
thus lead to faster reductions in atmospheric 
concentrations compared to the reduction of 
longer-lived greenhouse gases such as carbon 
dioxide (CO,). Consequently, research on at- 
mospheric methane and the importance of 
reducing its emissions for achieving the 1.5°C 
goal have been of increasing interest in re- 
cent years (4, 5). 

As the second most harmful anthropo- 
genic greenhouse gas—after CO,—methane 
has long been the target for emission reduc- 
tion in many countries. However, despite 
these efforts, over 350 million tonnes of an- 
thropogenic methane are still being emitted 


globally every year, with roughly one-third 
coming from the fossil fuel industry (6). In 
contrast to CO,, which is produced when 
the fossil fuel is 100% burned, methane 
emissions are often the product of waste, 
such as from leaks or the incomplete com- 
bustion of fossil fuel, or as the unwanted by- 
product from agricultural activities. Those 
who favor methane regulations may argue 
that, unlike rules governing CO, emissions, 
methane regulations may incentivize more- 
efficient use of resources rather than caus- 
ing a feared decrease in productivity—for 
example, by reducing methane losses in the 
oil and gas sector through leaks and unnec- 
essary releases. 

For methane emission, past data have 
indicated that a small number of super- 
emitters are responsible for a dispropor- 
tionately large share of the emissions. These 
superemitters include thousands of oil and 
gas wells in Canada (7) and the natural gas 
distribution infrastructure in US cities (8). 
One study that monitored methane emis- 
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Some oil and gas facilities, such as this refinery in 
Belarus, were identified as global methane emission 
hotspots by latest satellite data analysis. 


sions across more than 250,000 facilities in 
California reported that 30 of those emit- 
ters were responsible for 20% of the total 
methane emission in the state (9). 

Lauvaux et al. identified the leaders 
among these super-emitters—the so-called 
“ultra-emitters.” Using global data collected 
by the TROPOMI satellite over several years, 
the authors assembled a list of ultra-emitter 
sites responsible for a total of roughly 8 mil- 
lion tonnes of methane per year, with the 
warming potential equivalent to 250 mil- 
lion tonnes of CO, (10). To put this into per- 
spective, 250 million tonnes of CO, is the 
carbon footprint of more than 40 million 
people. This finding is even more notable 
when one considers that the TROPOMI sat- 
ellite was not designed to track emissions 
at the facility scale. However, the amount 
of emission from these ultra-emitters is so 
large that TROPOMI could track it with a 
spatial resolution in the 5-km range. 

Researchers have quantified CO, emis- 
sions from super-emitters by comparing 
land-based measurements with satellite- 
based estimates, and combining those val- 
ues with calculations of downwind plume 
concentrations using atmospheric modeling 
(11). However, in contrast to CO,, the loca- 
tions and emissions of major methane emit- 
ters are difficult to track, as methane emis- 
sions are an unwanted by-product of the oil 
and gas industry and often go unreported. 
The creation of a global inventory of meth- 
ane ultra-emitters provides crucial informa- 
tion for targeting the strongest emitters. This 
should be a useful arsenal for policy-makers 
to enact effective regulations to combat cli- 
mate change and alleviate climate-related 
economic calamity in the long run. 

Given the 3 years left in the designed op- 
erational life span of TROPOMI, the project 
will continue to provide data for monitor- 
ing the known ultra-emitters and detecting 
new ones. Looking past TROPOMI, multi- 
satellite approaches are emerging. For ex- 
ample, by combining TROPOMI data with 
high-resolution satellites that are designed 
to track facility-scale emissions, scientists 
can now precisely determine the source lo- 
cations and quantify emissions, such as dur- 
ing a recent blow-out event at the Ford Eagle 
Shale regions in Texas, where nearly 5000 
tonnes of methane were emitted in only 20 
days because of a control loss at a gas well 
(12). The global capacity for satellite-based 
monitoring of atmospheric methane should 
increase in the coming years, with a whole 
fleet of satellites waiting to join the hunt for 
methane sources. 
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The next generation of satellites is ex- 
pected to have better spatial and temporal 
coverage, as well as improved resolutions and 
accuracies. However, many of the planned 
satellites may still struggle with measure- 
ments performed over water, through clouds, 
or at nighttime. They may also have shorter 
operational periods than ground-based net- 
works. These satellites could struggle to track 
intermittent emitters with emissions under 
their detection threshold. Thus, it is impor- 
tant to integrate these satellite-based projects 
with drone-based and ground-based methods 
when tracking and quantifying emissions at 
the regional scale. 

Any future global methane monitoring 
system, such as the International Methane 
Observatory (IMEO) of the UN Environment 
Programme and the European Commission 
(13), will have to combine observations across 
different scales and techniques to be truly 
comprehensive. As a first step, the differ- 
ent methods and algorithms will have to be 
standardized or at least be made compatible. 
For example, a measurement of kilograms 
of methane emission per hour at a specific 
site by satellite A should be reproducible us- 
ing data collected by satellite B or by a drone 
at the same site. Such harmonization efforts 
and the collation of best practices for atmo- 
spheric monitoring are currently being ad- 
vanced by the Integrated Global Greenhouse 
Gas Information System (IG°IS) of the World 
Meteorological Organization (14). These on- 
going projects will continue to provide data to 
scientists, site operators, policy-makers, and 
citizens, to spur future research, help iden- 
tify operational issues, develop cost-efficient 
mitigation strategies. They will also help to 
inform the public of where the planet is head- 
ing in the grand scheme of climate change 
and how successful the Paris Agreement and 
the increased ambitions announced at the 
COP26 in Glasgow are going to be. 
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Tethering gene 
regulation to 
chromatin 
organization 


A two-tiered system of 
chromatin structure ensures 
robust gene expression 


By Marissa Gaskill and Melissa Harrison 


recise regulation of gene expression 
is crucial to cellular identity, and 
changes to gene expression profiles 
drive developmental transitions. 
Distinct enhancer elements interact 
with promoters to control cell type- 
specific gene expression. These enhancers 
are often located at a considerable distance 
from the genes they regulate. Chromosome 
looping and three-dimensional (3D) ge- 
nome organization have been suggested to 
bring the correct enhancers and promot- 
ers together to facilitate the exquisite spa- 
tiotemporal gene regulation required for 
successful development. Nonetheless, it 
remains unclear how genome organization 
is precisely regulated to ensure that distant 
enhancers locate the correct promoter to 
reliably drive gene expression. On page 566 
of this issue, Batut et al. (1) define a distinct 
class of cis-regulatory elements called teth- 
ering elements that promote interactions 
between enhancers and promoters. They 
propose a two-tiered system of genome or- 
ganization: tethering elements that connect 
distant enhancers to promoters and bound- 
ary elements that ensure the specificity of 
these enhancer-promoter interactions. 
Within the nucleus, chromosomes are or- 
ganized at multiple levels, including loops, 
topologically associating domains (TADs), 
and compartments. Investigations of the 
role of chromatin structure have largely 
focused on TADs, which are chromosomal 
regions enriched for self-interaction and 
delimited by insulator sequence elements. 
Despite TADs being a conserved feature 
of the eukaryotic genome, the importance 
of this 3D structure to gene regulation re- 
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INSIGHTS | PERSPECTIVES 


Tiers of genome organization 


Topologically associating domain (TAD) boundaries are enriched 
for insulator binding proteins and prevent regulatory elements 
outside the TAD from contacting promoters inside the TAD. 
Tethering elements are bound by pioneer transcription factors, 
possibly in hubs, bringing together enhancers and promoters 


for rapid gene activation (orange arrows). 


mains controversial. Enhancers preferen- 
tially interact with promoters within the 
same TAD. Indeed, disruption of individual 
TAD boundaries results in defects in gene 
expression and disease (2, 3). By contrast, 
additional studies demonstrated that alter- 
ing TAD boundaries at a genome level re- 
sulted in minimal effects on gene expression 
(4, 5). Furthermore, despite the pronounced 
changes in gene expression that drive de- 
velopment, TADs are notably stable over 
time and between cell types (6, 7). Because 
of these conflicting observations, there has 
been no overarching framework for how 3D 
chromatin structure affects enhancer-pro- 
moter contacts to promote gene expression. 

Using Micro-C, a technique that enables 
single-nucleosome maps of 3D chromatin 
interactions, Batut et al. generated a de- 
tailed interaction map of the genome in 
Drosophila melanogaster embryos. The 
high degree of resolution provided by 
Micro-C allowed the authors to identify a 
new class of regulatory elements, tethering 
elements. Tethering elements do not acti- 
vate transcription in transgenic reporter 
assays and therefore are distinct from en- 
hancers. Instead, they facilitate interactions 
between enhancers and promoters. Similar 
enhancer-promoter interactions have been 
identified genome-wide in mammalian cells 
(8), suggesting that this is a conserved fea- 
ture of genome structure. Focusing on the 
well-studied homeobox (Hox) locus, Batut 
et al. used live imaging of transcriptional 
dynamics to demonstrate that tethering 
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elements promote the rapid activation of 
gene expression and showed that disrup- 
tion of these elements has phenotypic con- 
sequences in the adult. As suggested by 
prior studies (9), the authors show that TAD 
boundaries within the How locus prevent in- 
teractions between enhancers and promot- 
ers located in different TADs. Disruption 
of TAD boundaries affects gene expression, 
but this can vary depending on the regula- 
tory elements that are in the neighboring 
TAD. Deletion of the genomic sequence of 
either tethering elements or TAD boundar- 
ies does not disrupt the formation of the 
other. Thus, tethering elements and TADs 
have independent contributions to chroma- 
tin organization and the regulation of gene 
expression (see the figure). 

In addition to functional differences, 
tethering elements and TAD boundaries 
may be formed at discrete times. Chromatin 
structure is established in early develop- 
ment (0, 11). A recent imaging-based strat- 
egy to interrogate chromatin structure dem- 
onstrated that chromatin loops are evident 
before TAD formation, supporting the inde- 
pendent formation of loops and TADs (12). 
Batut et al. showed that tethering elements 
and TAD boundaries are defined by a dis- 
tinct repertoire of binding factors. Whereas 
TAD boundaries are enriched for insulator 
binding proteins, tethering elements are 
bound by the pioneer transcription factors 
Zelda, GAGA factor (GAF), and Grainyhead. 
Pioneer factors are capable of binding 
closed chromatin, promoting chromatin 


accessibility and thereby activating gene 
expression. Zelda and GAF are essential for 
defining chromatin accessibility and acti- 
vating gene expression in the early D. me- 
lanogaster embryo (13, 14). Similarly, both 
factors have been implicated in the estab- 
lishment of 3D chromatin structure (JO, 11). 
A mechanistic connection between binding 
of Zelda and tethering element function is 
suggested by the finding that Zelda is es- 
sential for a subset of early formed loops 
between enhancers and promoters (72). The 
enrichment of multiple pioneer factor bind- 
ing sites in tethering elements suggests that 
a shared property of these factors may be 
essential for tethering. 

Although it remains for future studies to 
determine whether these pioneer factors 
are important for tethering element func- 
tion and, if so, how they promote tether- 
ing, it is notable that GAF and Zelda are 
not uniformly distributed in the nucleus. 
Zelda forms hubs of high local concentra- 
tion, which are important for its ability 
to recruit other transcription factors and 
potentiate gene expression (15). Thus, this 
property may be important for bringing 
enhancers and promoters in proximity. 
The recent identification of condensates 
or phase-separated domains of locally high 
concentrations of transcription factors has 
provided a framework to begin to under- 
stand these enhancer-promoter interac- 
tions. These transcription factor hubs may 
bring multiple enhancers and promoters 
together and, in this way, facilitate inter- 
action between these cis-regulatory ele- 
ments. It is not yet clear whether pioneer 
factors form condensates that recruit addi- 
tional factors to facilitate chromatin acces- 
sibility and enhancer-promoter interaction 
or if pioneer factors mediate chromatin ac- 
cessibility and this then drives subsequent 
hub formation. Defining these processes 
will be important for understanding how 
gene expression is precisely regulated to 
control development. & 
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INFECTIOUS DISEASE 


When viruses become more virulent 


Natural selection favors virulence when it is coupled with increased viral transmission 


By Joel 0. Wertheim 


he evolution of virulence—the degree 
to which a pathogen sickens, kills, or 
otherwise reduces its host’s fitness— 
depends on the biology of infection 
and transmission (7). A more virulent 
virus may be less transmissible be- 
cause in killing its host, it reduces the op- 
portunity for transmission. But virulence 
and transmissibility can be intrinsically 
linked, so that to maintain or increase infec- 
tiousness, a virus must be virulent. On page 
540 of this issue, Wymant et al. (2) describe 
the emergence of a more virulent and trans- 
missible variant of HIV that has spread 
to 102 known cases, mostly in 
the Netherlands, over the past 
decade. This finding raises 
questions about the selective 
pressures and molecular mech- 
anisms that drive increased vir- 
ulence and transmission. 

To appreciate the nuances in 
the evolution of virulence, it is 
worth revisiting the apocryphal 
tale of how a bioweapon—the 
extraordinarily lethal myxoma 
virus, which was released in 
Australia to exterminate the 
European rabbit infestation— 
quickly attenuated to become a 
benign infection. Myxoma virus 
is a vector-borne pathogen that, 
when introduced to Australia, 
had a >99% fatality rate. Less 
virulent myxoma virus evolved, 
but it continues to circulate and 
kill rabbits in Australia with a 50% mortal- 
ity rate. Ample highly virulent myxoma virus 
strains persist, indicating that evolution to- 
ward attenuation did not drive the most vir- 
ulent variants extinct (3). The key to main- 
taining virulence may lie in the inability of 
myxoma virus to replicate in its insect vec- 
tors; high titers of the virus must be main- 
tained in rabbits to ensure transmissibility. 
Thus, virulence (which achieves high viral 
load) is critical to its evolutionary success. 

In HIV, the biology is clear: There can be 
no decoupling of virulence and transmis- 
sibility (4). Upon initial HIV infection, the 
viral load peaks and then stabilizes to pro- 
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duce a prolonged, symptom-free chronic in- 
fection that can last a decade or more. This 
stabilized viral load is known as the set- 
point viral load and can be quite variable 
across infected people, differing by several 
orders of magnitude. This variability can 
partly be traced back to genetic variation in 
the virus itself. People with higher set-point 
viral load, left untreated, progress to AIDS 
more rapidly (5). Those same people will 
also be more infectious during this time pe- 
riod because higher viral load leads to more 
viral transmission. 

By contrast, people with lower set-point 
viral load will live symptom-free longer and 
be less infectious over this period. Therein 


Colored transmission electron micrograph showing HIV-1 virions 
emerging from T cells (red), continuing the infection cycle. Virulence and 
transmission of HIV-1 are influenced by viral load. 


lies the trade-off from the perspective of 
the virus. Burn bright and be brief, or smol- 
der and persist? For HIV, natural selection 
favors a middle ground, but the trajec- 
tory depends on the host population and 
the initial starting conditions. In Uganda, 
where heterosexual HIV transmission pre- 
dominates, there are two major subtypes 
of HIV: A and D. Subtype D was the most 
prevalent variant in the 1990s and is associ- 
ated with higher viral load and more rapid 
progression to AIDS than that of subtype A. 
Over the past two decades, subtype D has 
been gradually outcompeted in Uganda 
by the comparatively less virulent subtype 
A, which itself may be becoming even less 
virulent (6). The selective driver behind this 
change remains unknown. 


HIV in the United States is evolving in 
the opposite direction. Viral load at diagno- 
sis has been increasing every decade since 
the pandemic was first identified in 1981 (7), 
and higher viral loads are found in people 
belonging to larger HIV transmission clus- 
ters (8). Notably, this trend toward more 
virulent HIV is not emanating from a single 
cluster of transmission. Like the myxoma 
virus, this adaptation appears to be found 
across the spectrum of HIV genetic diver- 
sity in the United States. By contrast, the 
virulent cluster described by Wymant et 
al. emanated from a single cluster on the 
phylogenetic tree, suggesting a single adap- 
tive event in HIV evolution. However, the 
specific mutations that are re- 
sponsible for increased viru- 
lence were not identified. In 
both Europe and the United 
States, the genetic mechanisms 
underpinning viral adaptation 
remain unclear. 

Antiretroviral therapy upon 
diagnosis reduces viral load to 
a point at which it is undetect- 
able. Treatment not only im- 
proves survival among those 
infected with HIV but also 
largely prevents onward sexual 
transmission. Evolutionary 
modeling suggested that this 
intervention—which has been 
standard of care in the United 
States for a decade—would se- 
lect for increasingly virulent 
HIV. By decoupling the oppor- 
tunity for transmission from 
disease progression, HIV with higher viral 
load would be favored. However, HIV in the 
United States apparently began increasing 
in virulence before the test-and-treat era. 
Why, after more than a century of human- 
to-human transmission, is HIV virulence 
still evolving, and how will this virus evolve 
in response to efforts to “end the epidemic” 
through substantial reduction of transmis- 
sion over the next decade? 

Observing the emergence of more viru- 
lent and transmissible HIV is not a public 
health crisis. Standard public health ac- 
tion—including molecular HIV surveillance 
(9), facilitating linkage to care, and partner 
notification—are still the best options when 
faced with a rapidly growing cluster of more 
virulent HIV. Let us not forget the overreac- 
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tion of the claim of “Super AIDS” in 2005, 
when alarm was raised over a rapidly pro- 
gressing, multidrug-resistant HIV infection 
found in New York (0) that was ultimately 
restricted to a single individual. 

These findings are relevant to the 
COVID-19 pandemic. Although it is cer- 
tainly possible that severe acute respiratory 
syndrome coronavirus 2 (SARS-CoV-2) will 
evolve toward a more benign infection (77), 
like other “common cold” coronaviruses, 
this outcome is far from preordained. At the 
beginning of the COVID-19 pandemic, there 
was an underappreciation of the rapidity 
with which selection would lead to changes 
in transmissibility and virulence (72). But the 
ultimate outcome depends on whether and 
how SARS-CoV-2 transmission and virulence 
are linked. SARS-CoV-2 variants demon- 
strate that this virus is repeatedly evolving 
to be more transmissible, and not all of these 
adaptive variants are demonstrably more 
virulent. However, the Delta variant that 
dominated global cases in late 2021 shows 
how SARS-CoV-2 could evolve to be both 
more transmissible and more virulent (13). 
The Omicron variant is more transmissible, 
but whether it is more or less virulent in im- 
munologically naive individuals is unclear. 
Immune evasion, receptor binding efficiency, 
and tissue tropism may contribute to the 
evolution of virulence (14, 15). Deciphering 
the mechanisms of SARS-CoV-2 virulence 
and its relationship with transmission and 
immunity will be essential to understand 
how and why its virulence may evolve. But 
the HIV and SARS-CoV-2 pandemics show 
how viruses can and will evolve higher viru- 
lence when favored by natural selection. 
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The adenine methylation debate 


N°-methyl-2’-deoxyadenosine (6mA) is less prevalent 
in metazoan DNA than thought 


By Konstantinos Boulias' and 
Eric Lieberman Greer’ 


denine methylation, forming N’*- 
methyl-2’-deoxyadenosine (6mA), is 
a prevalent DNA modification in pro- 
karyotes and has recently been pro- 
posed to exist in multicellular eukary- 
otes (metazoans) to regulate diverse 
processes, including transcription, stress 
responses, and tumorigenesis. However, the 
existence of 6mA, and therefore its biologi- 
cal importance, in metazoan DNA has been 
debated by recent studies, which have either 
detected 6mA at much lower abundances 
than initially reported or failed to detect 6mA 
at all. On page 515 of this issue, Kong et al. 
(2) report the development of 6GmASCOPE, a 
quantitative method that deconvolutes 6mA 
in samples of interest from contamination 
sources. They detected low amounts of 6mA 
in fruit flies (Drosophila melanogaster), plants 
(Arabidopsis thaliana), and humans, which 
suggests that 6mA is much less abundant 
in these organisms than previously thought. 
These data suggest that a reassessment of 
6mA in eukaryotic DNA is warranted. 
The discovery of 6mA in multicellular 
eukaryotic DNA (2-4) was facilitated by the 


development of highly sensitive detection 
and mapping methodologies. These include 
ultrahigh-performance liquid chromatogra- 
phy coupled with tandem mass spectrometry 
(UHPLC-MS/MS), which has a detection limit 
of 0.1 to 1 parts per million (ppm) (5), and 
single-molecule real-time sequencing (SMRT- 
seq), a long-read DNA sequencing technique 
that maps methylated bases by quantifying 
rates of incorporation of complementary 
bases, which are altered when bases are 
modified (6). However, these methods have 
limitations: UHPLC-MS/MS cannot discrimi- 
nate the source of 6mA, which becomes prob- 
lematic when 6mA is of low abundance in the 
organism compared with the abundance in 
bacterial contaminants (7). Moreover, long- 
read sequencing methods, such as SMRT-seq, 
are error prone, and SMRT-seq requires high 
sequencing depth and loses accuracy when 
6mA is lower than 10 ppm (7, 8). Because of 
these limitations, several laboratories have 
been unable to detect 6mA, or they have 
detected 6mA at substantially lower concen- 
trations in metazoan genomes (7-10), which 
has led some to question whether 6mA is a 
directed DNA modification in metazoans. 
Kong et al. developed 6mASCOPE, a SMRT- 
seq analysis method that quantitatively de- 


Organisms with adenine methylation 

To quantify the amount of N®°-methy|-2’-deoxyadenosine (6mA) present in genomic DNA, single-molecule 
real-time sequencing (SMRT-seq) data are analyzed with 6mASCOPE. In 6mASCOPE, small DNA fragments 
are produced, adaptors are added, and high-coverage SMRT-seq is performed. 6mASCOPE is a reference-free 
analysis method that deconvolutes SMRT-seq data to identify the source of 6mA. 
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The abundance of 6mA 

Previously, ultrahigh-performance liquid chroma- 
tography coupled with tandem mass spectrometry 
(UHPLC-MS/MS) identified 6mA in prokaryotes and 
metazoans. However, analysis with 6mASCOPE found 
that 6mA was overestimated in metazoans owing to 
contamination with DNA from microbiota or food. 
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termines in which species 6mA is present, 
enabling discrimination between 6mA in the 
metazoan genome and that in contaminating 
microorganisms (see the figure). 

Existing SMRT-seq methods compare the 
interpulse duration [(IPD) the time between 
successive base additions, which is altered 
by DNA modifications] of native template 
with the reference genome, ignoring con- 
taminating DNA with abundant 6mA. Kong 
et al. overcome this limitation by devising a 
reference-free approach. By using the long- 
read sequencing to exclusively sequence 
short (200 to 400 base pairs) DNA sequences, 
each molecule is heavily resequenced, which 
leads to higher-confidence circular consen- 
sus sequence (CCS) base-calling accuracy. A 
metagenomic analysis allows for CCS reads 
to be mapped to both the genome of inter- 
est and to potential contamination sources 
by using a comprehensive set of genomes, 
including those from microbiota. The 6mA/A 
ratios were estimated using a machine learn- 
ing model trained with a broad range of 
6mA content. As a proof of principle, the 
authors performed 6mASCOPE on two uni- 
cellular eukaryotes with high amounts of 
6mA, Chlamydomonas reinhardtii (11) and 
Tetrahymena thermophila (12). They con- 
firmed high 6mA in these protists and fur- 
ther refined the methylation motif (VATB: V 
= A, C, or G; B = C, G, or T) and preference 
of 6mA to occur in specific locations in the 
linker regions between nucleosomes. 

Kong et al. next applied 6mASCOPE to 
D. melanogaster, A. thaliana, and Homo sa- 
piens—three multicellular eukaryotes with 
reported high 6mA abundances [~700 ppm 
for D. melanogaster embryos (2), 2500 ppm 
for A. thaliana seedlings (3), and 500 to 1000 
ppm for H. sapiens lymphocytes (13) or pri- 
mary glioblastomas (J4)]. They found that 
bacteria in the gut of D. melanogaster or in 
the soil of A. thaliana samples, which made 
up avery small amount of the mapped reads, 
accounted for the majority of 6mA quantified 
by UHPLC-MS/MS. This led to 6mA abun- 
dance in D. melanogaster and A. thaliana ge- 
nomes being quantified at ~2 or 3 ppm (near 
the limit of detection). These findings are bol- 
stered by previous work that demonstrated 
that nematode worms (Caenorhabditis el- 
egans) have substantially lower 6mA abun- 
dance (0.1 to 3 ppm) than previously esti- 
mated because of bacterial contamination in 
the gut and that zebrafish (Danio rerio) em- 
bryos have artificially increased 6mA quan- 
tifications because of bacteria adhering to 
the chorion membrane, which surrounds the 
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embryo, as assessed by UHPLC-MS/MS (7). 

6mASCOPE performed on_ peripheral 
blood mononuclear cells and two glioblas- 
toma brain tissue samples yielded 6mA 
abundances of 17 and 2 ppm, respectively. A 
recent study suggested that 6mA is increased 
in mammalian mitochondrial DNA (5), but 
6mASCOPE also failed to detect increased 
amounts of 6mA in the mitochondrial DNA 
of human HEK293 cells. Kong et al. con- 
firmed earlier results (7, 10) that exogenous 
premethylated DNA can be incorporated into 
eukaryotic DNA and increases 6mA content. 
Together, these findings challenge high 6mA 
abundances in multicellular eukaryotes. 
Instead, 6mA is likely much rarer than pre- 
viously thought and is possibly variable be- 
tween different tissue samples or cell lines. 
It is also possible that 6mA increases only 
under specific stress conditions (15). 

6mASCOPE’s limit of detection (~1 to 10 
ppm) makes it hard to conclude whether 
estimated 6mA abundances of 2 to 3 ppm 
are real and above background. These limi- 
tations can be addressed through the devel- 
opment of sequencing methods that take 
advantage of the distinct chemistry of 6mA, 
similar to bisulfite sequencing for 5-methyl- 
cytosine. Additionally, future studies should 
combine this more-rigorous 6mASCOPE and 
optimized UHPLC-MS/MS methods (7) with 
a focus on stress conditions and mitochon- 
drial DNA (15). Moreover, 6mASCOPE cannot 
discriminate potential misincorporation of 
either abundant messenger RNA containing 
6mA or foreign methylated DNA that could 
be integrated into eukaryotic DNA through 
the nucleotide salvage pathway. Combining 
rigorous detection methods with the manip- 
ulations of putative 6mA-regulating enzymes 
and directed epigenomic editing of 6mA will 
help address whether rare 6mA in metazoans 
has a functional role in specific locations in 
the genome or is randomly localized as a po- 
tential by-product of misincorporation by the 
salvage pathway. 
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An adaptive 
device for 
Al neural 
networks 


The perovskite nickelate 
can transform among 
four different electronic 
components 


By Rohit Abraham John 


he human brain’s ability to maneu- 
ver the avalanche of unstructured 
data, learn from experience, and 
process information with extreme 
energy efficiency inspires the next 
generation of computing technolo- 
gies (1, 2). Neuronal plasticity is defined 
as the capability of the brain to change 
its structure and function in response to 
experience. This functional and structural 
plasticity is what researchers are trying to 
achieve in the so-called “neuromorphic” 
circuits and computer architectures (3-6). 
Specific learning rules observed in biology 
have been faithfully replicated recently in 
electrical components (7, 8). However, the 
ability for a logical device to learn and 
modify from experience, and to grow and 
shrink when required, have yet to be ex- 
plicitly demonstrated. On page 533 of this 
issue, Zhang et al. (9) present highly plas- 
tic perovskite nickelate devices that can be 
electrically configured and reconfigured to 
become resistors, memory capacitors, arti- 
ficial neurons, and artificial synapses. 

The material design principle for creat- 
ing reconfigurable devices is based on pro- 
tonation-induced doping of nickelates such 
as NdNiO,, or NNO. At room temperature, 
an ideal NNO is a correlated metal, which 
means that electrons would interact among 
themselves inside the material instead of 
behaving independently. Hydrogen, an elec- 
tron donor, can be inserted into the NNO 
lattice by annealing the material in hydro- 
gen gas while connected to a catalytic elec- 
trode. This process modifies the electrons’ 
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orbital configuration in the 
nickel atoms to largely reduce 
the electrical conductivity. 
Upon applying single-shot elec- 
tric pulses across the material, 
Zhang et al. could redistribute 
protons within the lattice. This 
generates a multitude of elec- 
tronic states based on the final 
distribution and local concen- 
tration of the charges. These 
metastable states can then be 
configured on demand to per- 
form the functionalities of resis- 
tors, Memory capacitors, neu- 
rons, and synapses—a first for 
a single device (see the figure). 

In comparison with oxide- 
based devices that rely on the 
migration of oxygen vacancies 
through the nickelate lattice, 
the proton-redistribution ap- 
proach used by Zhang et al. 
enables a larger charge modu- 
lation at a faster time scale because of 
the much smaller ionic radius of protons 
as compared with those of oxygen atoms. 
Moreover, the devices are formed by using 
semiconductor foundry-compatible tech- 
niques and on substrates compatible with 
complementary metal-oxide semiconduc- 
tor (CMOS) circuits, making this a prom- 
ising “lab-to-fab” technology and immedi- 
ately pertinent to the electronics industry. 

All electronic materials have defects that 
can create a barrage of metastable conduc- 
tivity states. As a result, the value of this re- 
search discipline lies in the ability to identify 
and engineer functional states for specific 
applications. Zhang et al. discovered in their 
device many metastable configurations, each 
having distinct electrical and chemical sig- 
natures. The authors first explore the possi- 
ble configurations using theoretical calcula- 
tions and subsequently confirm them using 
a combination of resistance and capacitance 
measurements, Raman spectroscopy, and 
scanning near-field optical microscopy. 

The large pool of possible electronic 
states available within a single device 
would help to enable the implementa- 
tion of reservoir computing frameworks 
in hardware. Reservoir computing is a 
computational framework inspired by a 
specific type of neural network theory 
that maps input signals into higher-di- 
mensional computational spaces by using 
the dynamics of a fixed, nonlinear system 
known as a reservoir. The hydrogen-doped 
NNO of Zhang e¢ al. is a strong candidate 
to be used as nodes for reservoir comput- 
ing in hardware. Each node has two non- 
linear components: the memristor—a de- 
vice that combines functions of a memory 
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Hydrogen-doped perovskite nickelate as a 


versatile reconfigurable platform 
By applying electric pulses, the hydrogen ions in the nickelate lattice can occupy 
metastable states and enable distinct functionalities. This allows the same device 
to be reconfigured on demand as a resistor, a memory capacitor, an artificial 
neuron, or an artificial synapse. 


Resistor 


neuron 


and a resistor—and the similarly named 
memcapacitor. Together, the two compo- 
nents represent an internal state and out- 
perform theoretical reservoirs on several 
classification tasks in terms of improved 
accuracy and faster convergence. 

A reconfigurable device may also help 
realize grow-when-required (GWR) _net- 
works, which is an artificial neural network 
in which the system can, as its name sug- 
gests, grow when required. In more techni- 
cal terms, this means that when the input to 
the GWR network does not achieve a certain 


“This can enable compact 
and energy-efficient 
neuromorphic system designs 
of reservoir computing 
frameworks and dynamic 
neural networks.” 


threshold of activity, the system would au- 
tomatically create a new node. Through the 
capacity to grow on demand, the network 
overcomes problems caused by resource de- 
pletion—a common problem for static com- 
putational networks. Similarly, the network 
can also shrink its size if inactive nodes are 
detected, saving operational costs. According 
to theoretical models created by Zhang et 
al., their GWR network has supremacy over 
static networks by up to 250% accuracy for 
incremental learning scenarios. The recon- 
figurability offered by the device further ex- 


Capacitor 


x 


pands the efficiency and reliabil- 
ity for a GWR system, at least 
in theory, and will help realize 
more dynamic architectures for 
continual learning. 

The reconfigurable device 
by Zhang et al. represents a 
substantial advance by having 
multiple neuronal and synap- 
tic functionalities embedded 
within a single device. This can 
enable compact and energy- 
efficient neuromorphic system 
designs of reservoir computing 
frameworks and dynamic neu- 
ral networks. However, to bring 
this vision to practical hard- 
ware implementation, research- 
ers still have to find answers to 
many questions, such as how 
to deal with the nonuniformity 
of the devices, how to make the 
device connect to or disconnect 
from the neural network, how to 
rearrange the connections when the device is 
reconfigured from one function to another, 
and how to determine the role of each device 
and apply the correct voltage scheme on it. 

The electrical circuits in use today are 
designed with multiple passive compo- 
nents such as resistors, capacitors, and 
inductors and active devices such as tran- 
sistors. With the discovery of memristors, 
circuit designers now have an extra degree 
of freedom (10-12) when designing power- 
efficient, high-performance systems. 
However, from a material implementation 
perspective, the construction of these com- 
ponents still requires complex assembly of 
various conductive, semiconductive, and 
insulating materials. The ability to imple- 
ment almost all of these elements with a 
single material platform can substantially 
change electronics. Hence, such reconfigu- 
rable electronic devices could have positive 
implications beyond neuromorphic com- 
puting and machine intelligence. 
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VIEWPOINT: COVID-19 


Lethal mutagenesis as an antiviral strategy 


Lethal mutagenesis of RNA viruses is a viable antiviral strategy but has unknown risks 


By Ronald Swanstrom! and 
Raymond F. Schinazi? 


iruses depend on the host cell to carry 

out much of their replication, with 

each offering only a few virus-specific 

targets for the development of antivi- 

ral therapies. This makes the devel- 

opment of broadly active antivirals 
difficult to conceptualize. Numerous RNA 
viruses—including severe acute respiratory 
syndrome coronavirus 2 (SARS-CoV-2), Zika 
virus, and Chikungunya virus—have led to 
recent epidemics, highlighting the need for 
effective antiviral drugs that can be enlisted 
quickly. Some years ago, a broadly applicable 
antiviral strategy was proposed in which a 
slight increase in the error rate of a rapidly 
replicating RNA virus would overwhelm the 
capacity to remove deleterious mutations, 
driving the viral population to extinction; 
this strategy is called lethal mutagenesis 
(1). Although the antivirals ribavirin and fa- 
vipiravir were developed with this strategy in 
mind, the recent development of the much 
more potent molnupiravir to treat SARS- 
CoV-2 highlights the unknown risks to the 
host that this strategy entails. 

The genome size of an organism is in- 
versely related to the error rate during rep- 
lication, and this holds true for small RNA 
viruses with genomes of 7 to 30 kb (2). For 
RNA viruses, this translates into one nucle- 
otide substitution for every two to three ge- 
nomes synthesized. Most mutations are del- 
eterious, but a subset of mutations will give 
rise to potentially useful phenotypic diver- 
sity, which may undergo selection. Lethal 
mutagenesis is a universal antiviral strat- 
egy for RNA viruses (especially those that 
cause acute disease) because they all have 
the same vulnerabilities of small genomes 
and rapid replication, making them highly 
sensitive to an increased mutation rate. 

The strategy for increasing the rate of new 
mutations in RNA viruses is to design ribo- 
nucleoside analogs that can be metabolized 
to ribonucleoside triphosphates in cells and 
then be incorporated into the viral genome 
during viral RNA synthesis. The design of the 
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analog allows the base portion of the ribo- 
nucleotide to base pair ambiguously dur- 
ing subsequent RNA synthesis. Thus, once 
incorporated into viral RNA, the analog 
will base pair with one of several natural 
nucleotides during RNA synthesis, leading 
to a mutation. RNA viruses synthesize com- 
plementary plus and minus strands of RNA 
during viral replication and do this multiple 
times. For example, it is estimated that the 
poliovirus RNA genome undergoes five con- 
secutive rounds of replication within a cell 
before new virus particles are released (3). 
As the viral RNA genome is amplified in the 
cell, the effects of the mutagen are concen- 
trated in the viral genome. 

The first ribonucleoside analog that was 
identified as capable of inducing mutations 
in an RNA virus was the purine analog riba- 
virin, which forms base pairs as either adeno- 
sine or guanosine when used at high concen- 
trations in human cells in vitro (4). Ribavirin 
has pleotropic effects on the cell, and its lim- 
ited antiviral effect in vivo is by an uncertain 
mechanism (5). Favipiravir is a base analog 
that is metabolized to a ribavirin-like mole- 
cule in the cell. It is approved for use against 
influenza virus infection in Japan, and it has 
been shown to be antiviral and mutagenic 
against SARS-CoV-2 when used at high doses 
in an animal model (6, 7). Favipiravir is now 
being evaluated in multiple human trials to 
treat COVID-19. 

A significantly more potent antiviral drug 
that mediates lethal mutagenesis has re- 
cently come to the forefront as a potential 
antiviral in the current SARS-CoV-2 pan- 
demic—molnupiravir (8, 9). This is an orally 
available 5’-isobutyl form of the cytidine 
analog B-D-N*-hydroxycytidine (NHC) (0). 
This molecule contains an additional oxygen 
atom in the extra-ring amino group at posi- 
tion four of the cytidine base. In this position, 
the oxygen destabilizes a hydrogen atom, also 
bound to this extra-ring nitrogen, leading to 
migration back and forth with the ring po- 
sition three nitrogen; this changes the base- 
pairing properties back and forth between 
uridine and cytidine (11, 12) (see the figure). 
In uridine, position four in the ring of the 
base has an extra-ring oxygen as a carbonyl, 
suggesting that RNA synthesis is relatively in- 
sensitive to the chemical composition at this 
position (aside from its role in base pairing). 
This highlights why NHC should be readily 
metabolized by the cell. In a cell culture- 


based assay, NHC was 100 times more potent 
as an inhibitor of SARS-CoV-2 than ribavirin 
or favipiravir (13). Molnupiravir was effica- 
cious in mouse models of respiratory SARS- 
CoV and Middle East respiratory syndrome 
coronavirus (MERS-CoV) infection (9), con- 
sistent with NHC having broad antiviral ac- 
tivity (10). 

A recently reported clinical trial of mol- 
nupiravir showed a 30% reduction in hos- 
pitalization when people with symptomatic 
SARS-CoV-2 infection (and at risk for more 
serious disease) were treated with molnu- 
piravir within the first 5 days of symptoms 
(14). Based on these results, the US Food 
and Drug Administration (FDA) has ap- 
proved an emergency use authorization 
(EUA) for molnupiravir to treat symptom- 
atic SARS-CoV-2 infections. Molnupiravir 
has also been approved for the treatment 
of COVID-19 in the United Kingdom, and 
there are expectations that it will be made 
widely available around the world. 

However, the antiviral strategy of lethal 
mutagenesis comes with a cautionary note. 
Ribonucleosides must be phosphorylated to 
the 5’-triphosphate form to be substrates for 
RNA synthesis (host or viral). Ribonucleosides 
synthesized by the host cell are formed as 
the 5’-monophosphate. Ribonucleoside ana- 
logs enter this biosynthetic pathway through 
phosphorylation by a salvage kinase to form 
the 5’-monophosphate (see the figure). The 
ribonucleoside 5’-monophosphate is phos- 
phorylated to the ribonucleoside 5'-diphos- 
phate and then to the 5’-triphosphate (now 
ready for RNA synthesis). The ribonucleoside 
5'-diphosphate is the obligatory intermedi- 
ate in this pathway, which creates a potential 
problem. Ribonucleoside 5’-diphosphate is 
also the obligatory intermediate in the syn- 
thesis of the 2'-deoxyribonucleoside 5'-di- 
phosphate that is on the pathway to form 
2'-deoxyribonucleoside 5'-triphosphates, 
which are used in DNA synthesis. The en- 
zyme ribonucleotide reductase (RNR) is re- 
sponsible for this reaction. Thus, there is a 
clear metabolic pathway for a mutagenic ri- 
bonucleoside analog to become a precursor 
for host DNA synthesis. 

Molnupiravir was shown to be positive in 
the bacterial Ames test (an assay that mea- 
sures mutagenic potential), where two animal 
model assays of mutagenic potential were 
largely negative, leading the FDA to state in 
the EUA fact sheet that “molnupiravir is low 
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Mutagenesis with ribonucleoside analogs 
Antiviral ribonucleoside analogs—such as NHC (molnupiravir), RBV, and FAV—transit the ribonucleotide biosynthetic pathway and become the substrate 
for host and viral RNA synthesis. They may also appear in the 2’-deoxyribonucleotide pathway owing to the activity of RNR. 


NHC 
Pyrimidi | Kinase 
yrimidines nies ye 
rNMP ——> frNDP 
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RBV FAV Sab? 


Ribonucleoside analogs converge at the rNMP, which is metabolized to the rNDP and then to the rNTP 
to become the substrate for host and viral RNA synthesis. However, the rNDP is also the substrate for the 


synthesis of the DNA precursor dNDP. 


dNDP, 2'-deoxyribonucleoside 5'-diphosphate; dNTP, 2'-deoxyribonucleoside 5'-triphosphate; FAV, favipiravir; NHC, B-D-N*-hydroxycytidine; 
5'-diphosphate; rNMP, ribonucleoside 5'-monophosphate; RNR, ribonucleotide reductase; rNTP, ribonucleoside 5'-triphosphate. 


risk for genotoxicity” (15). However, the abil- 
ity of the molnupiravir metabolite NHC to 
transit the RNR pathway was demonstrated 
in a cell culture-based assay of mammalian 
cell mutagenesis (73), raising questions about 
which assays should be used for evaluating 
the risk of mutagenesis in humans. 

There is a gap in our knowledge in scaling 
short-term lab-based assays (using bacteria, 
animal cells, and animal models) for muta- 
genic activity with long-term risk to human 
health. Mutagens that are incorporated dur- 
ing cellular DNA synthesis are problematic 
for a developing fetus (where cells are un- 
dergoing rapid division), male germline cells 
(which continue to divide throughout life), 
and cancer risk (where the small fraction of 
human cells that are dividing have the po- 
tential to incorporate a mutation that could 
contribute to cancer development). Humans 
are exposed to mutagens throughout life—for 
example, DNA mutations are induced by x- 
ray imaging or during air travel—so there are 
levels of DNA damage that are considered 
to be largely inconsequential. If the molnu- 
piravir metabolite NHC really is a mutagen 
in dividing animal cells, how should negative 
data in an animal model be interpreted? Are 
such negative data sufficient to ensure long- 
term safety in humans, or does the lack of 
knowledge about the link between negative 
results in animal assays and long-term out- 
comes in human health need to be acknowl- 
edged? Molnupiravir use will come with 
some restrictions around short-term risks as- 
sociated with reproductive health, but it may 
take years before potential long-term risks 
are understood. The best outcome, which is 
the assumption from the negative results in 
animals, is that molnupiravir treatment falls 
within the background level of exposure to 
mutagens that humans already experience 
and tolerate. The half-life of molnupiravir 
metabolites in human tissue is unknown. 

By definition, lethal mutagenesis will 
cause increased sequence diversity within 
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the viral population. This has raised the is- 
sue of whether the intentional introduction 
of sequence diversity will speed up viral evo- 
lution, with the specific concern being anti- 
body escape mutants that would undermine 
vaccine efforts. Adding random mutations 
at a density of 1 per 1000 bases of the viral 
genome is sufficient to reduce infectivity of 
the viral population in the range of 100-fold, 
as shown for poliovirus and SARS-CoV-2 (4, 
13). Treatment with molnupiravir modestly 
reduces the shedding of viral RNA and signif- 
icantly reduces the infectiousness of SARS- 
CoV-2 in patients with COVID-19 (8, 14). 
Thus, during successful treatment and clear- 
ance of the virus, the potential for evolution 
would appear minimal. However, for people 
who fail to clear the virus and maintain a 
persistent infection, whether treatment with 
molnupiravir will affect the course of viral 
evolution remains unknown. Similarly, at- 
tempts to treat patients with a combination 
of molnupiravir and the SARS-CoV-2 prote- 
ase inhibitor nirmatrelvir should carefully 
follow any sequence changes within the viral 
3CL protease coding domain to assess the po- 
tential evolution of resistance. 

There is a desperate need to make effica- 
cious SARS-CoV-2 treatments widely avail- 
able, to develop new broadly active antiviral 
treatments to allow rapid response to new 
SARS-CoV-2 variants, and, more generally, to 
be able to respond to new RNA virus epidem- 
ics. Molnupiravir has the potential to lower 
the disease burden of SARS-CoV-2 infections 
and help contain future emerging RNA vi- 
ruses. However, how can its potential long- 
term effects as a mutagen be assessed? The 
following steps are suggested: Treatment 
should be restricted to those who will benefit 
the most, such as those who cannot tolerate 
other available treatments, those who have a 
preexisting condition that enhances the risk 
of COVID-19, and those who are more than 
50 years of age and would be less affected 
by a potential long-term risk of cancer or 


ee 
Viral RNA (+) strand Cytidine NHC HO_ Uridine 
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Mutations are introduced during viral replication when the 
NHC-derived ribonucleotide in viral RNA is recognized as 
either cytidine or uridine owing to ambiguous base pairing. 


pol, polymerase; RBV, ribavirin; rNDP, ribonucleoside 


reproductive risks. A registry of a cohort of 
people who received molnupiravir should be 
kept to longitudinally monitor the frequency 
of cancer and other potential outcomes so 
that the opportunity to understand the risk 
(or lack thereof) associated with the use of 
a mutagenic ribonucleoside as an antiviral is 
not missed. Strategies to limit metabolism of 
mutagenic analogs from the ribonucleotide 
pool into the 2’-deoxyribonucleotide pool 
should be explored to limit the potential DNA 
mutation load in the host. In addition, the vi- 
ral population diversity should be evaluated 
after treatment with molnupiravir in those 
who fail to clear the virus to see whether the 
treatment accelerates viral evolution. Lethal 
mutagenesis has the potential to be an im- 
portant antiviral strategy for RNA viruses, 
especially in emerging infections when there 
is an absence of virus-specific antivirals. The 
potential of this strategy should be exploited, 
but the possible risks should be acknowl- 
edged and addressed. 
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RETROSPECTIVE 


Robert H. Grubbs (1942-2021) 


Brilliant organic chemist and inspiring mentor 


By Melanie Sanford 


obert Howard (Bob) Grubbs died on 

19 December 2021. He was 79. Best 

known for developing catalysts that 

revolutionized the way organic and 

polymer chemists put molecules to- 

gether, Bob was awarded the 2005 
Nobel Prize in Chemistry for this work, along 
with Yves Chauvin and Richard Schrock. 
Grubbs was a rare example of a brilliant sci- 
entist who was also a true mensch. His most 
enduring legacy will be the generous friend- 
ship, support, and encouragement that he 
gave to colleagues, trainees, and others over 
the course of his distinguished career. 

Born on 27 February 1942 in rural 
Kentucky, Grubbs was fascinated throughout 
his youth by tinkering, building, and trying 
to understand how things work. Grubbs ini- 
tially planned to pursue agricultural chemis- 
try, but after an early research experience, he 
changed his focus to organic chemistry. After 
completing his BS (1963) and MS (1965) de- 
grees at the University of Florida, he earned 
a PhD in 1968 at Columbia University study- 
ing with chemist Ronald Breslow and com- 
pleted a National Institutes of Health post- 
doctoral fellowship at Stanford University 
working with chemist James Collman. This 
training instilled a tremendous foundation 
in organic and inorganic reaction mecha- 
nisms and spurred his lifelong interest in 
organometallic chemistry. Grubbs started 
his independent career at Michigan State 
University in 1969. After 9 years, he moved 
to the California Institute of Technology 
(Caltech), where he maintained an active re- 
search group until his death. 

Grubbs spent nearly his entire career 
studying a chemical reaction known as olefin 
metathesis. This reaction involves the break- 
ing and forming of carbon-carbon double 
bonds (C=C), which are some of the strongest 
bonds present in organic molecules. Olefin 
metathesis was discovered in the 1950s, but 
the original catalysts were poorly defined, 
and their mechanisms were not well under- 
stood. In his early work, Grubbs focused on 
addressing these challenges by pursuing dis- 
crete catalysts for this transformation and 
interrogating the mechanism of the key C=C 
bond cleavage and formation step. 
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His work in the early 1980s studying a ti- 
tanium catalyst known as the Tebbe reagent 
was critical in shaping his research trajectory. 
It led him to recognize the power of well- 
defined catalysts for achieving controlled and 
predictable reactivity. However, the instabil- 
ity of this titanium catalyst toward common 
functional groups (Lewis bases, air, and wa- 
ter) inspired his pursuit of more practical 
and functional group tolerant catalysts. This 
laid the groundwork for his invention of the 
series of ruthenium catalysts that bear his 
name. These ruthenium carbene complexes 
are remarkable because they can be handled 
on the benchtop and are highly selective for 
C=C bonds over more Lewis basic groups. 

Grubbs was the central force both in de- 
veloping the fundamental science behind 
ruthenium olefin metathesis catalysts and 


\ 

” \ 

in pioneering their translation to com- 
mercial applications. In 1998, he founded 
the company Materia, which scaled up the 
synthesis of these catalysts and made them 
widely available to the academic and indus- 
trial chemistry communities. Since then, 
the Grubbs catalysts have been used for the 
construction of hepatitis C drugs and in the 
development of biorefineries that convert 
plant oils into higher-value chemicals. They 
are also widely used for commercial produc- 
tion of polydicyclopentadiene, a moldable 
high-performance material with exceptional 
impact and corrosion resistance. 

Grubbs delighted not only in field- 
changing scientific discoveries but also in 
the more mundane aspects of academic 
research. Whereas most faculty covet new 


laboratory equipment, during my time in his 
research group from 1997 to 2001, Grubbs 
was proudest of his decades-old gloveboxes 
and gel permeation chromatographs. He 
regularly came through the lab to marvel 
at the spectacular red, orange, purple, and 
green colors of newly synthesized ruthenium 
complexes. He was passionate about demon- 
strations for his introductory organic chem- 
istry course and loved to practice exploding 
hydrogen balloons (trying to get the biggest 
bang) and making polymers (trying to get 
the most dramatic polymerization). 

Aman of wide-ranging interests outside of 
the lab, Grubbs took frequent rock climbing 
excursions to Joshua Tree National Park and 
eagerly anticipated annual group camping 
trips in northern California. He was a pas- 
sionate basketball player and fan. He could 
frequently be found on the sidelines of games 
at Caltech, at Yale (where his daughter, Katy, 
starred), and at the Staples Center, where he 
was famously photographed sitting courtside 
a few seats away from the late Kobe Bryant. 
His love of sports endeared him to his col- 
leagues and also their children; indeed, my 
14-year-old son has wonderful memories of 
a pickup basketball game that they played. 

Beyond his scientific accomplishments, 
Grubbs’s legacy is the people that he trained, 
mentored, and encouraged. He advised more 
than 300 graduate students and postdoctoral 
fellows, who have gone on to careers in aca- 
demia, industry, law, and beyond. Grubbs was 
a fabulous mentor, offering trainees a balance 
of scientific vision and intellectual freedom 
that allowed us to discover and explore sci- 
entific directions that matched our passions. 
He encouraged his students to work hard 
and play hard, and he was a bemused spec- 
tator (and sometimes participant) in raucous 
St. Patrick’s Day parties, Kit Kat tasting con- 
tests, and foosball competitions. 

Grubbs was exceedingly generous in ac- 
cepting seminar invitations, squeezing his 
six-foot-six frame into economy airplane 
seats to travel around the world. As such, 
thousands of scientists had the opportunity 
to interact with him throughout his career. 
Those visits were truly memorable for par- 
ticipants. Grubbs filled his talks with folksy 
words of wisdom, delivered in his character- 
istic mumble. After his talks, he was just as 
eager to interact with junior scientists as he 
was with established professors. Hundreds 
of these interactions have been shared over 
the past month in an outpouring of photos 
and memories on social media. Overall, Bob 
Grubbs will be remembered as a brilliant 
chemist who had an outsized impact on the 
people around him through his love of sci- 
ence, love of life, and generosity of spirit. & 
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How NFTs could transform 
health information exchange 


Can patients regain control over their health information? 


By Kristin Kostick-Quenet!, Kenneth D. 
Mandl?3, Timo Minssen’, I. Glenn Cohen*, 
Urs Gasser®, Isaac Kohane??, Amy L. McGuire? 


ersonal (sometimes called “pro- 

tected”) health information (PHI) 

is highly valued (7) and will be- 

come centrally important as big 

data and machine learning move 

to the forefront of health care and 
translational research. The current health 
information exchange (HIE) market is 
dominated by commercial and (to a lesser 
extent) not-for-profit entities and typically 
excludes patients. This can serve to under- 
mine trust and create incentives for shar- 
ing data (2). Patients have limited agency 
in deciding which of their data is shared, 
with whom, and under what conditions. 
Within this context, new forms of digital 
ownership can inspire a digital market- 
place for patient-controlled health data. 
We argue that nonfungible tokens (NFTs) 
or NFT-like frameworks can help incentiv- 
ize a more democratized, transparent, and 
efficient system for HIE in which patients 
participate in decisions about how and 
with whom their PHI is shared. 

NFTs grew out of the concept of “tokens” 
in gaming, whereby a “fungible” token can 
be used to purchase a thing of value (e.g., 
a gold coin to spend on a superpower) but 
a NFT can only be traded, given its intrin- 
sic value that is not directly comparable 
to that of another token (e.g., a specific 
sword versus a specific tapestry). NFTs 
have evolved into digital contracts com- 
posed of metadata to specify access rights 
and terms of exchange. Their nature as 
metadata means that NFTs point to digi- 
tal content but are not the content itself. 
The use of NFTs as a tool for digital artists 
to prevent the unsanctioned circulation of 
artwork online has since bled into sports, 
entertainment, and even health care, com- 
modifying digital information and creating 
a multi-billion-dollar market. 


NFTs are made up of a unique 40-digit 
identification code (“hash”) and a uniform 
resource locator (URL) linking to the con- 
tent online, forming a “smart contract” 
that can range from just a few lines of 
computer code to a more elaborate set of 
instructional code. These contracts desig- 
nate a patient-controlled copy of the digi- 
tal data and the terms under which they 
can be accessed and used, using pseud- 
onyms that permit deidentification while 
ensuring transparency and accountability. 

NFTs are created by “minting” digital 
content on a blockchain. Minting involves 
uploading and having other computers 
verify and time-stamp the content, loca- 
tion, and originator of digital informa- 
tion, and all subsequent transactions 
are recorded on a digital ledger that is 
distributed across a network of comput- 
ers. Redundancy, along with the compu- 
tational difficulty and processing energy 
required, makes it difficult to tamper with 
the transaction record. Blockchains, some- 
times referred to as “trustless” systems, 
provide a verifiable infrastructure to man- 
age digital assets. 

There are many situations in which the 
distinctive features of NFTs could pro- 
vide potential advantages over, and help 
address gaps in, the existing HIE sys- 
tem. For example, in 2020, the US Office 
of the National Coordinator of Health 
Information Technology (ONC) issued 
a rule as part of the 21st Century Cures 
Act that provided a technical basis for 
an individual to assert the right of access 
to a computable version of their medi- 
cal record. By the end of 2022, all certi- 
fied PHI technology will need to support 
Substitutable Medical Applications and 
Reusable Technologies (SMART) on the 
Fast Healthcare Interoperability Resources 
(FHIR) application programming inter- 
face (API), which allows third-party apps 
to connect with and request data from 
health care provider electronic health 


records (EHRs) (3). To ease the cumber- 
some manual process of extracting data 
from EHRs, the SMART on FHIR API sup- 
ports an ecosystem of patient-facing apps 
that could serve to enhance patient agency 
in directing sharing of their data. Although 
this represents a leap forward in engaging 
patients in HIE, the largely commercial na- 
ture of the marketplace may still serve to 
undermine trust and create disincentives 
for sharing PHI. 


POTENTIAL BENEFITS 
Automating data access and control 
At least two scenarios could be realized 
under an NFT or NFT-like framework for 
personal control of health data. In the 
first, PHI would be uploaded (or “minted”) 
as a distinct, “original” version, with the 
generator or custodian (e.g., EHR com- 
pany, hospital, biobank, etc.) required to 
register each new datum (e.g., diagnosis 
of illness, prescription) on a public block- 
chain. The encrypted PHI would only be 
accessible to those given explicit permis- 
sion in a smart contract. Mann et al. (4) 
proposed such a scheme using blockchain 
and smart contracts to “prosent” (proac- 
tively consent) pseudonymously to data 
release or exchange for certain uses. In 
this way, patients could specify in advance 
with whom they agree to share data with- 
out needing to consent to every transac- 
tion, enabling greater patient control and 
more timely and efficient data exchanges. 
In cases where patients might prosent to 
the sale of their PHI or to participation in 
clinical trials offering compensation, the 
smart contract could allow for automated 
distribution of funds to the patient. If the 
patient ever wanted to modify the terms of 
the contract, each change of terms would 
be immutably stored as distinct, time- 
stamped events in the blockchain ledger. 
In a second, but not mutually exclusive, 
scenario, a patient’s data could stay right 
where it is (for example, a hospital data- 
base or on a patient’s phone), and the smart 
contract could “push” an algorithm to sum- 
marize or analyze the data with automated 
permission from a smart contract. This ap- 
proach is highly compatible with federated 
learning approaches that train machine- 
learning algorithms on multiple local data- 
sets without explicitly exchanging data 
samples or compiling them into a central- 
ized server. Smart contracts could help to 
realize data privacy and security goals of 
federated learning, recently credited as be- 
ing the “future of digital heath” (5). 
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Transparency and efficiency 
By automating data-sharing agreements, 
smart contracts can address long-standing 
inefficiencies and the lack of transparency 
in HIE (6) and may give rise to a market- 
place of third-party user platforms that 
allow patients to consult an intuitive pub- 
lic ledger of transactions involving their 
PHI. With the right safeguards in place, 
patient-oriented industries might emerge 
for tracking and aggregating a patient’s au- 
thenticated data and for providing proxy 
management of smart contracts on their 
behalf. Other types of new marketplaces 
might emerge as well. Advanced data- 
sharing agreements and contracts might 
enable the securitization of multiple NFTs 
into derivatives, which might lead to in- 
novative markets. Experiences from the 
financial sector and the particular sensi- 
tivity of health applications 
point out the need for robust 
and proactive regulatory safe- 
guards of such innovation. 
Conversely, data requesters 
could benefit from easy verifi- 
cation of the authenticity and 
provenance of health data, as 
well as automated and stream- 
lined data procurement. Each 
user, for instance, a prespeci- 
fied set of research institutes, 
would be granted a particular 
access level according to the 
smart contract terms, and re- 
quests for data access could 
be made transparent. A PHI- 
specific blockchain (or set of 
interoperable blockchains) 
could ensure accountability by 
making available a patient-ac- 
cessible index of requester identities while 
maintaining the pseudonymity of patients’ 
IDs. This incorporation of “privacy by de- 
sign” automation and pseudonyms lends 
NFT-type technologies their capacity to 
deidentify patient data and allows patients 
to participate in data-sharing decisions 
while making it easier for companies and 
institutions to access information about a 
subject without strongly increasing risk of 
reidentification. At least some of the in- 
formation could be contained in metadata 
to flag certain PHI with substantial mar- 
ket value (e.g., genomic information from 
patients with rare diseases, patients on a 
newly approved medication, participants 
in particular clinical trials). 


CHALLENGES 

Data security and privacy 

Blockchain technology upon which NFTs 
are built does not necessarily prevent data 
breaches because digital information itself 
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is not stored “on-chain.” This leaves only 
the metadata (the NFTs) on-chain, protect- 
ing information integrity regarding data 
provenance, terms of data exchange, and 
transaction history. The underlying data 
pointed to by NFTs—the PHI—are only as 
secure as the practices and procedures of 
the myriad online platforms that provide 
health-data storage and access. Without 
proper digital security and data encryption 
infrastructure or technological advance- 
ments to counter the gradual accumulation 
of noncritical failures in data storage (i.e., 
“bit rot”), a NFT could eventually amount 
to nothing more than a defunct URL. 
High-stakes NFTs are thus increasingly 
stored in decentralized and highly redun- 
dant networks like the InterPlanetary File 
System (IPFS). These systems reduce server 
resources and costs and provide multiple 


sources of backup. Already, the NFT mar- 
ket has led to the emergence of third-party 
“pinning” platforms that afford digital 
content greater longevity, along with other 
data safekeeping solutions that are likely to 
affect digital security more broadly and to 
relieve high processing costs. Additionally, 
new blockchain protocols (notably Solana 
and Fantom at the time of writing) are rap- 
idly evolving to accommodate scalable on- 
chain storage with high throughput while 
also keeping energy costs down. Likewise, 
Arweave has introduced “blockweaves” (as 
opposed to “chains”) to incentivize nodes 
to ensure data replicability and permanent 
storage. These advancements improve data 
security, permanence, and scalability and 
potentially enable “a multichain future” 
where chains can specialize and interop- 
erate to support high-performance smart 
contract platforms. However, they also 
carry other legal and ethical concerns re- 
lated to the right to erase or to rectify inac- 


curacies in personal data (7), which the im- 
mutable nature of blockchain technology 
and decentralized storage might render 
functionally difficult to exercise. 

Another challenge is that the privacy- 
by-design feature of pseudonymity, which 
is so central to NFTs and blockchain, may 
be limited when pseudonyms are attached 
to health data. Many health data are be- 
coming so granular as to constitute digital 
“fingerprints” for which only a few data 
elements could allow for patient reiden- 
tification (8). For NFTs to truly maintain 
pseudonymity, they must be supported 
with advancements in data encryption, for 
example, helping “hash” data before they 
reach human eyes (if they ever do). 

In addition to technological advance- 
ments, NFT data security and protection 
concerns need to be addressed on a global 
stage and through regulatory 
means. The European Union 
(EU) General Data Protection 
Regulation (GDPR), for in- 
stance, imposes strict obliga- 
tions for processing personal 
data and requires procedural 
safeguards to avoid and re- 
spond to data breaches and 
misuses (9). Although progress 
is being made in the United 
States (e.g., the California 
Consumer Privacy Act) and 
federal-level reforms are being 
debated (10), the US “patch- 
work” regulatory approach 
varies widely by state and cir- 
cumstance and continues to 
lag behind the GDPR (J1). 

The EU-US Privacy Shield 
arrangement for cross-border 
flow of PHI was invalidated (72) in large 
part because of the fact that US regula- 
tion offers limited opportunities for legal 
redress for noncitizens. NFTs could help 
repair these agreements by helping to fill 
gaps in the capacity of individuals to seek 
legal redress in at least two ways. The first 
and most straightforward way is by con- 
cretizing a set of terms by which an indi- 
vidual (dis)agrees to contractually share 
PHI with certain other entities. Like other 
legally binding private contracts between 
individuals and entities, if those terms are 
clearly stated and thoroughly cover a wide 
range of possible data-sharing scenarios 
(including definitions of breach), and if the 
terms of NFT smart contracts are mutually 
recognized by governing bodies relevant 
to the jurisdictions in which PHI is being 
exchanged, then smart contracts should 
constitute a legally binding mechanism 
for seeking redress in cases where PHI ex- 
changes break the terms of the contract. 
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A second and more indirect way to ad- 
dress the lack of legal redress is for NFT- 
type smart contract terms to stipulate in 
advance the approved entities with which 
an individual agrees to share (or prohibit 
access to) their PHI. By allowing patients to 
prosent in such a way, smart contracts pro- 
vide a preemptive and automated mecha- 
nism for data exchange that is aligned with 
individuals’ preferences. Unless PHI is ac- 
cessed in a data breach (highly unlikely 
by virtue of no longer being stored in a 
centralized database and even less likely 
if stored “on-chain”), those data 
will be unavailable (i.e., “digitally 
locked”) for access, purchase, or 
theft without the NFT owner’s 
consent. A preemptive and au- 
tomatically executed smart con- 
tract might, with the right legal 
design, minimize the need for 
individuals to seek legal redress. 


Intellectual property rights 

A major advantage of NFTs when 
discussed in the art world is their 
verification of authenticity and 
originality. Originality may be less of a con- 
cern for HIE. Many entities seeking to use 
PHI may not care about the originality of 
the data; they just care whether those data 
are accurate representations of a patient’s 
PHI. Storing a patient’s health data as an 
NFT could help to verify its provenance and 
accuracy. However, it also raises important 
questions about who legally controls PHI 
and has the right to share or sell it. The 
answer is not straightforward. Intellectual 
property rights (IPRs) over creative works 
usually lie with an artist or, in certain cases, 
with companies that curate digital informa- 
tion. By contrast, PHI is often thought of as 
collected or generated through the use of 
technology, rather than created, thus lack- 
ing the “originality” typically required for 
copyright protection. Once collected or gen- 
erated and organized by hospitals, device 
manufacturers, or pharmaceutical compa- 
nies, those data may become the property of 
the data collector, not the patient. Entities 
that eventually develop clinical follow-on 
innovations based on data stored as NFTs 
may also claim some form of IPR. Because 
rights over health data remain complex and 
contested (13), NFTs may be most useful for 
specifying collaboratively agreed-upon con- 
ditions for exchange and reciprocity, if not 
direct ownership. 


Equity, sustainability, and trust 

Given the complexity of NFTs and the legal 
structure, it is unclear whether average citi- 
zens will be able to take advantage of them 
in the health space. Most people will need 
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the aid of trusted intermediaries to mint 
their data and manage private keys. They 
will also need user-friendly interfaces to in- 
terpret requester and transaction ledgers. 
In addition, patients will need legal support 
to create smart contracts that serve their in- 
terests. Ironically, this need for intermedi- 
aries could be the slippery slope back to the 
centralization of PHI. If these intermediar- 
ies are costly, socioeconomic factors will act 
as gatekeepers for “digital ledger citizen- 
ship,” exacerbating existing digital divides 
and participation gaps. 


“By automating data-sharing agreements, 


smart contracts can address 


long-standing inefficiencies and the 
lack of transparency in 
HIE [health information exchange]...” 


Another force winding back to central- 
ization is that a widespread switch to PHI 
stored as NFTs on a blockchain would re- 
quire more computer power to verify a 
growing amount of data and number of 
transactions, resulting in greater energy 
costs and necessitating more capital and 
physical investment. These dynamics may 
incentivize a return to powerful institutions 
and constitute critical challenges to the rai- 
son d’étre of decentralized infrastructures. 
However, certain blockchain support inno- 
vations are emerging to address energy and 
climate burdens, with implications for the 
democratization of participation through 
reduced processing costs. 


A PATH FORWARD 

In addition to these challenges, attempts 
to democratize data-sharing decisions are 
likely to encounter resistance by key play- 
ers who dominate the multi-billion-dollar 
medical-industrial complex and benefit 
from a status quo where the exchange of 
patients’ digital health data remains un- 
monitored and largely unregulated (J). 
Their resistance against using NFTs to 
automate patient preferences may further 
strengthen if it is discovered that these 
preferences entail widespread reluctance 
to share personal data without express per- 
mission or compensation, as some evidence 
suggests (J4). Any viable consideration of 
NFTs for health may require the gradual 
introduction of patient-focused control 
and dynamic consent clauses into existing 
data-sharing agreements, which currently 


allow for large-scale exchanges of EHR 
data between health care organizations 
and third parties without express patient 
awareness or consent. Smart contracts 
have advantages over existing approaches 
to consent in that conditions for consent 
may be more granular and data exchanges 
can be tracked and verified by patients 
and do not require that exchanges occur 
through third-party custodians. As an im- 
mediate step, NFTs could be explored in 
experimental test cases that involve block- 
chains and smart contracts. Such flexible 
frameworks may offer opportuni- 
ties for simultaneous “symbiotic” 
development of technology and 
regulation to accommodate new 
digital ownership models like 
NFTs and test their impacts in 
real-world settings. Further, they 
could offer empirical insights 
into best practices for balancing 
innovation with individual rights 
and public interests (9). Accom- 
panying qualitative research into 
patient preferences and attitudes 
toward NFT-like frameworks as 
a means to exercise control over patients’ 
PHI will be critical to understanding util- 
ity and acceptance, as well as to ensure 
the accessibility and interpretability of 
their PHI. 
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From dualism to deism 


A philosopher comes full circle 


By John Zerilli 


mong the various possibilities ex- 

plored in David Chalmers’s intriguing 

and entertaining romp through phi- 

losophy, Reality+, is a kind of deism— 

the view that the universe is the work 

of an intelligent being who sets its 
laws of operation in motion but then declines 
to intervene further. Deism was the theology 
de rigueur of elite opinion in the late 18th- 
century United States. But in Chalmers’s 
hands, deism is not quite the view that a su- 
pernatural being created the universe. It is 
the view that there is “a serious possibility” 
that our universe is a computer simulation 
run by an advanced civilization. 

There are a few strands to the argument, but 
the nub of it is that if an intelligent civilization 
lasts long enough, it will likely develop 
simulation technology and create many 
simulated universes inhabited by intelligent 
beings. Under minimal assumptions, it can 
be shown that these simulated universes will 
greatly outnumber nonsimulated ones. Thus, 
any extant intelligence has a considerably 
greater chance of being simulated than not. 

As always, the soundness of the argument 
comes down to its premises. It is not clear 
that any civilization can survive long enough 
to develop simulation technology in the first 
place. Furthermore, consciousness may not 
be the sort of thing that can be simulated 
at all. Or ethical constraints in an advanced 
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civilization may forbid the running of 
simulations containing sentient creatures. 
Chalmers considers all these objections 
and several more. He concludes that we can 
be highly confident that one of the following 
three scenarios holds true: (i) we are simulated 
entities, (ji) humanlike simulated entities 
are impossible, or (iii) humanlike simulated 
entities are possible, but few humanlike 
simulators will create them. He 
calculates the probability of the 


first scenario to be at least 25%. oa 6G 
REALITY S 


From this it follows that we cannot 
be sure we are not in a simulation. 
Furthermore, “If the simulation 
argument is even approximately 
as good as the design argument, it 
deserves to be in the pantheon of 
arguments for God’s existence.” 

Chalmers attained rock-star 
status in philosophy during the late 
1990s as a defender of dualism, a 
metaphysical hypothesis that, at the time, 
was considered more or less defunct among 
naturalistically inclined philosophers. Dualists 
maintain that consciousness cannot be 
accounted for in purely physical, materialist 
terms (what Chalmers famously dubbed the 
“hard problem”). While dualism is certainly 
not a mainstream view, it is once again being 
taken seriously by a substantial minority 
of both scientists and philosophers, in no 
small part owing to Chalmers’s distinctive 
arguments and thought experiments. 

In this latest offering, Chalmers seems to 
have come full circle, articulating what he 
describes as an entirely naturalistic account 
of God—i.e., a god not exempt from natural 
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Reality+: 

Virtual Worlds and the 
Problems of Philosophy 
David J. Chalmers 
Norton, 2022. 544 pp. 


As virtual reality devices become more sophisticated, 
what we consider possible expands. 


laws. That is why the book could mark a 
turning point in educated opinion. It may be 
that Chalmers will do for deism what he was 
able to do for consciousness: make the idea 
respectable again. 

There are two reasons why the times may 
favor neo-deism. First, Chalmers is not alone 
or even the first high-profile secular or atheist 
voice to have mooted “simulation theology.’ 
There are already enough heavyweights 
from the worlds of science, big business, 
and philosophy to make simulation theology 
respectable among an important section of 
the intelligentsia. Neil deGrasse Tyson, Elon 
Musk, and Nick Bostrom all spring to mind. 

Second, computer science has expanded 
the horizons of what is considered possible. 
Consider that the Enlightenment forerunner 
of the simulation hypothesis was a thought 
experiment proposed by René Descartes. But 
Descartes’s argument trafficked in evil spirits. 
Compelling though his meditations may once 
have been, their force was bound to wane in a 
secular age. Enter virtual reality, its inevitable 
improvement over the next few decades, and 
the possibility of a Matrixz-style immersive 
experience, and suddenly Descartes’s “evil 
demon” acquires a most contemporary garb. 

There is a lot more in this book than 
can be conveyed in the space of a short 
review. Chalmers has something to say on 
most of the “big questions” in philosophy: 
on the immortality of the soul 
(simulations “may at least make an 
afterlife possible”); the freedom of 
the will (“the jury is still out”); the 
existence of an external world (even 
asimulated world would be real; it’s 
just that at the most fundamental 
level it would be made up of bits 
in the simulator’s computer, not 
quarks and electrons); and, of 
course, the existence of God. He 
also considers questions of value 
and politics in virtual worlds, 
although these are not the most original 
parts of the book. While on all of these issues 
Chalmers cuts a Gordian knot in writing both 
accessibly and illuminatingly, it is the material 
in part 3 on the reality of virtual worlds, part 5 
on the possibility of consciousness and mind- 
body interaction in digital worlds, and part 
7 on language and structuralism in physics 
where Chalmers breaks new ground. 

Chalmers’s Reality+ is sure to roil a good 
deal of philosophical banter—and not just in 
graduate seminar rooms. More seriously, it 
offers a lot for theists, atheists, and agnostics 
to ponder as they reassess their already 
“considered” views on a well-worn subject. 

10.1126/science.abn2690 
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Imagining Rosalind Franklin 


The crystallographer’s story comes alive in a work 


of historical fiction 


By Katie Langin 


n Her Hidden Genius, author Marie Bene- 

dict transports readers to another time: 

Europe is rebuilding after World War II, 

the shock of the Holocaust reverberates, 

food rationing abounds. But the story’s 

central struggles will feel all too familiar 
to anyone who has set foot in a modern sci- 
entific laboratory, as much of the action takes 
place in research environments beset with 
bullying, competition, and sexism. 

The novel centers on a brilliant scientist 
who died far too young: Rosalind 
Franklin. Benedict has made a ca- 
reer out of writing novels about 
historically important women, pro- 
filing such figures as physicist Mileva 
Maric, novelist Agatha Christie, and 
librarian Belle da Costa Greene. She 
bases her fictionalized narratives on 
what is known about each woman, 
painting a picture of their lives with 
invented scenes and dialogue. 

We meet Franklin on the streets of 
Paris as she walks to her first day of 
work at the Laboratoire Central des 
Services Chimiques. The 26-year-old, 
who received her PhD from the Uni- 
versity of Cambridge 2 years earlier, 
is greeted warmly by lab members—a 
welcome change from her experience 
in England. In the months that fol- 
low, she learns x-ray crystallography, 
a technique that can be used to visu- 
alize the molecular structure of many 
substances, and becomes a star of 
the lab, known for her experimental 
prowess and technical skill, which 
she uses to study graphite. 

Four years later, she leaves the 
Paris lab and accepts a position at 
King’s College London. “I worry that I’ve 
made the wrong choice,” Benedict’s Franklin 
remarks. She loves Paris and the lab in which 
she works, but she is drawn back to England, 
in part to be closer to her family—an upper- 
class Jewish clan with deep roots in London. 
It is a career decision that proves pivotal. 

On the first day in her new lab, Franklin 
learns that she is to use her x-ray crystal- 
lography skills to decipher the structure of 
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DNA. A physical scientist by training, she 
balks at first, responding, “Pardon?...Not 
crystalline substances?” But she quickly 
dives headfirst into the task, working long 
hours to generate images of DNA with un- 
precedented clarity. The helical structure 
of DNA begins to become apparent, but 
Franklin and her assistant, Raymond Gos- 
ling, keep the details of their discoveries 
largely secret while they work to amass an 
unimpeachable body of evidence. 
Meanwhile, a pair of young scientists 
at the University of Cambridge—Francis 


Franklin examines a sample with a microscope in 1955. 


Crick and James Watson—enter the race to 
describe the structure of DNA and begin 
working on a theoretical model. Franklin 
rejects their first attempt, which featured 
phosphate groups on the inside of the DNA 
strand and bases on the outside, declar- 
ing it scientifically impossible. But they 
eventually get it right, aided by Franklin’s 
images and data, which were slipped to 
them without her knowledge or permis- 
sion. (Maurice Wilkins, a King’s College 
colleague who had befriended Crick and 
Watson, is a prime suspect.) 


. 


HER, 


Done 
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Her Hidden Genius: 
A Novel 

Marie Benedict 
Sourcebooks 
Landmark, 2022. 
304 pp. 


The book ends with Franklin, aged 37, on 
her deathbed. She was diagnosed with ovar- 
ian cancer a year and a half earlier while 
doing pioneering work on RNA viruses at 
Birkbeck College. She continued that work 
while undergoing treatment, at one point 
believing she had been cured. “Science has 
taken care of me. As it always has,’ she de- 
clares. But the cancer comes back, and she 
dies in London on 16 April 1958. 

Much of this tale will be familiar to sci- 
entific readers, but in Benedict’s telling, 
Franklin’s struggles come alive. She engages 
in fierce disputes with Wilkins, who 
eventually wins the Nobel Prize, 
along with Watson and Crick, for his 
work visualizing DNA. She misses 
important networking opportunities, 
in part because King’s College has a 
men-only dining area. She is repeat- 
edly referred to as “Miss Franklin” 
instead of “Dr. Franklin’—an annoy- 
ance that will likely resonate with 
many women scientists today. And 
she bristles at the nickname “Rosy,” 
which Watson, Crick, and Wilkins 
use behind her back. (Watson was 
criticized for using this nickname in 
his 1968 book The Double Helix.) 

Throughout the book, I found 
myself wondering whether certain 
conversations and events were 
based in fact or whether they were 
products of Benedict’s imagination, 
which led me to scour nonfiction 
sources for further information 
about Franklin. Readers who find 
themselves in a similar situation 
might choose to read as a compan- 
ion Brenda Maddox’s biography 
Rosalind Franklin: The Dark Lady 
of DNA, which Benedict consulted 
during her research for this novel. 

Overall, Benedict’s retelling of Franklin’s 
story offers a compelling look at the scien- 
tist’s impressive and all-too-short life. It also 
raises broader questions about the scientific 
enterprise: Are conditions much better for 
women scientists today? Does academia’s 
first-to-publish reward system pervert the 
process of science? Who deserves credit for 
a scientific discovery? There are no easy 
answers in Her Hidden Genius, but there is 
much food for thought. 

10.1126/science.abn2940 


science.org SCIENCE 


PHOTO: UNIVERSAL HISTORY ARCHIVE/UNIVERSAL IMAGES GROUP/GETTY IMAGES 


Aman guides a car through the Great Smog of 1952 in London. The acidity of the particles in air pollution affects how harmful they are to humans. 
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Particle toxicity’s role 
in air pollution 


In their Report “Abating ammonia is 

more cost-effective than nitrogen oxides 
for mitigating PM, , air pollution” (5 
November 2021, p. 758), B. Gu and col- 
leagues propose that reducing ammonia 
(NH,) emissions could decrease air pol- 
lution caused by particles of less than 

2.5 um in diameter (PM, ,), a change that 
they predict would benefit human health. 
However, not all particles affect health 
equally (1-4). Because ammoniated PM, . 
is less acidic than sulfuric particulate mat- 
ter formed by, for example, burning coal 
(5), decreasing particles formed with NH, 
may make the remaining air pollution 
more lethal. Air pollution mitigation strat- 
egies should consider the risk to health 
posed by various components, not just the 
total particulate mass. 

The role of acidity in enhancing particle 
toxicity has been recognized since the 
Great Smog of London in 1952. During the 
5 days of extreme air pollution in the city, 
animals with higher NH, exposures were 
less adversely affected, and physicians 
placed vials of NH, in hospital wards 
to protect patients (6, 7). Subsequent 
research has confirmed that NH, in the 
air reduces the acidity of ambient par- 
ticles (8) and that acidity mobilizes toxic 
transition metals, inducing oxidative 
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stress (9-11). Moreover, a recent epidemio- 
logical study has determined that the oxi- 
dative potential of outdoor PM, is associ- 
ated with acute cardiovascular events, and 
combined exposure to transition metals 
and acidic sulfate enhances those cardio- 
vascular effects (12). 

Because PM, , components’ toxicities 
vary, estimates of the health impacts of 
each component should take into account 
its individual properties. Gu et al’s sug- 
gested reduction in NH, emissions might 
well reduce PM, . mass but would also 
increase the acidity of the aerosol mixture. 
Rather than achieve the predicted health 
benefits, the change could regionally 
increase adverse health effects where acid- 
neutralizing NH, emissions are diminished. 
The health benefits that Gu et al. expect 
must be confirmed experimentally before 
the implementation of such a policy. 
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Response 


Thurston e¢ al. argue that ammonia (NH,) 
abatement may not reduce the adverse 
health effects of particles with a diam- 
eter of less than 2.5 j1m (PM, .) due to 

the dependence of toxicity on the acid- 

ity of PM, .. Although they have usefully 
highlighted the effect of acidity of PM, . 

on human health, there is no definitive 
evidence that quantification of the effects 
of PM, , components separately should be 
recommended in policy-making (7) or that 
emission controls of ammonia like those 
we suggest would substantially change the 
aerosol acidity. We are not arguing for NH, 
controls in isolation; rather, we contend 
that NH, abatement can play an important 
role in reducing exposure to PM, . and 
associated health impacts in the context of 
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continued mitigation of other pollutants, 
such as sulfur dioxide (SO,) and nitrogen 
oxides (NO,). 

PM, , can vary across regions from 
highly acidic (pH of ~0.5) to mildly acidic 
(pH of ~6) (2). In the United States and 
Canada, large reductions in SO, and NO, 
emissions over the past decade have not 
resulted in clear changes to acidity (3, 4). 
Global reduction of agricultural NH, emis- 
sion alone by 50% (similar to the proposed 
mitigation in our study) would reduce 
PM, , pH (ie., increase acidity) by about 0.6 
units (5), and we would expect even weaker 
changes with joint controls of SO, and NO.. 
Whether such changes in aerosol acidity 
are sufficient to affect the mobilization of 
harmful transition metals is still unknown. 

Emissions of air pollutants have changed 
substantially since the 1952 Great Smog 
of London (6). At that time, SO, emissions 
from coal burning were indeed a dominant 
reason for adverse health effects (7), likely 
due in part to acute acidity. The use of NH, 
alleviated the acute acidity, but its effect 
could also be ascribed to a reduction in 
exposure to toxic concentrations of SO, (8). 

Emission controls of SO, and NO, have 
a long history, whereas NH, has too often 
been ignored (6, 9). It would thus be unre- 
alistic to imagine effective control of NH, 
and unregulated emissions of SO, and NO.. 
We argue for the need to start to control 
NH, emission given its large contribution to 
PM, , formation and its high cost-efficiency 
of abatement, thereby catching up to the 
progress already made in reducing SO, and 
NO, emissions. 
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The protein-folding 
problem: Not yet solved 


We agree with H. H. Thorp (“Proteins, pro- 
teins everywhere,” Editorial, 17 December 
2021, p. 1415) and numerous others (J) that 
the advance in protein structure predic- 
tion achieved by the computer programs 
AlphaFold (2) and RoseTTAfold (3) is 
worthy of special notice. The accuracies 

of the predictions afforded by these new 
approaches, which use machine-learning 
methods that exploit the information 
about the relationship between sequence 
and structure contained in the databases 
of experimental protein structures and 
sequences, are much superior to previous 
approaches. However, we do not agree with 
Thorp that the protein-folding problem has 
been solved. 

AlphaFold achieves a mean C-alpha root 
mean square deviation (RMSD) accuracy 
of ~1 A for the Critical Assessment of 
Structure Prediction 14 (CASP14) dataset 
(2). This accuracy corresponds to that of 
structures determined by x-ray crystal- 
lography or single-particle cryo-electron 
microscopy at very low resolution. The 
accuracy of these methods is several times 
better than machine learning methods; for 
example, at 3 A resolution, the coordinate 
C-alpha RMSD accuracy for empirically 
determined structures is far better than 
1A. At present, for the best cases, the 
C-alpha coordinate RMSD accuracy of 
AlphaFold-predicted structures roughly 
corresponds to the accuracy expected for 
structures determined at resolutions no 
better than ~4 A. Thus, although structural 
predictions by AlphaFold and RoseTTAfold 
may be accurate enough to assist with 
experimental structure determination 
(3), they alone cannot provide the kind of 
detailed understanding of molecular and 
chemical interactions that is required for 
studies of molecular mechanisms and for 
structure-based drug design. 

A further complication for structure 
prediction is the dynamic structural varia- 
tion in a given sequence. Allosteric states, 
which can differ dramatically, may be in 
an intrinsic equilibrium or depend on a 
binding partner, which may be a ligand or 
cofactor (e.g., ATP or cobalamin), another 
macromolecule (e.g., DNA or a protein 


partner), or aberrant self-association (e.g., 
pathogenic amyloids). Work is in prog- 
ress to address protein complexes (4, 5), 
but structure prediction remains to be 
achieved for those in complicated molecu- 
lar machines and for those with ligands 
that affect conformation, which may be as 
yet unidentified. 

Recent advances should be taken as a 
call for further development. Moreover, 
lessons should be learned from history. In 
1990, Alwyn Jones and Carl-Ivar Brandén 
published a commentary on errors in x-ray 
crystal structures (6) that stimulated the 
development of cross-validation and vali- 
dation tools for structural biology (7-9) 
and that ultimately made the databases of 
experimental structures much more reli- 
able. Thus, tools should be developed to 
assess coordinate accuracy of predictions 
and alleviate bias toward structural pat- 
terns observed in repositories. 

Finally, it is necessary to reflect on 
what the word “solved” might mean in 
the context of the protein-folding prob- 
lem. Some may feel that this problem will 
have been solved once any method has 
been found that enables one to obtain 
accurate predictions of the structures of 
proteins from their sequences. AlphaFold 
and RoseTTAfold represent a major step 
forward in that direction, but they are not 
the final answer. Others, including us, feel 
that solving the protein-folding problem 
means making accurate predictions of 
structures from amino acid sequences 
starting from first principles based on the 
underlying physics and chemistry. Despite 
these major advances in protein structure 
prediction, experimental structure deter- 
mination remains essential. 
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MATERIALS SCIENCE 
Artificial enamel analog 


ooth enamel is the thin outer layer of our teeth and is the 
hardest biological material in the human body. Zhao et al. 
engineered an enamel analog consisting of assembled 
hydroxyapatite nanowires with amorphous intergranular phase 
segments aligned using scalable, dual-directional freezing in the 
presence of polyvinyl alcohol. The artificial tooth enamel was designed 
to closely mimic the composition of the natural material by copying 
the shapes and sizes of the components found biologically and the 
organization of their interfaces. —-MSL Science, abj3343, this issue p. 551 


Self-assembly of hydroxyapatite nanowires can produce an artificial analog of 
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enamel, the hard biomineral that covers the surface of the teeth. 


Unmasking place fields 
in hippocampal CA1 


A basic transformation process 
in the brain is the conversion 

of aneuron’s excitatory and 
inhibitory inputs to spikes. 
Experimentally examining 

the transformation process 
requires access to subthresh- 
old membrane dynamics. To 
date, only intracellular record- 
ings have met this requirement. 
Valero et al., using a new tech- 
nique based on optogenetic 
stimulation to probe the excit- 
ability of neurons, examined the 
subthreshold activity dynamics 
of CA1 pyramidal neurons dur- 
ing sharp-wave ripples, theta 
oscillations, and place fields. 
During sharp-wave ripples, 
overall excitability shifted 
toward synaptic inhibition. 
However, during theta waves 
and in the center of place 
fields, excitability moved in the 
direction of synaptic excitation. 
This stimulation unmasked the 
place fields of nonplace cells, 
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indicating that the proportion 
of place cells in CAl is much 
higher than previously thought. 
—PRS 

Science, abm1891, this issue p. 570 


Genes control cortical 
surface area 


Humans exhibit heritable 
variation in brain structure and 
function. To identify how gene 
variants affect the cerebral cor- 
tex, Makowski et al. performed 
genome-wide association stud- 
ies in almost 40,000 adults and 
9000 children. They identified 
more than 400 loci associated 
with brain surface area and 
cortical thickness that could 

be observed through magnetic 
resonance imaging analy- 

ses. Examining the biological 
pathways linking gene vari- 
ants to phenotypes identified 
region-specific enrichments of 
neurodevelopmental functions, 
some of which were associ- 
ated with psychiatric disorders. 


Partitioning genes with heritable 
variants relative to evolutionary 
conservation helped to identify 
a hierarchy of brain develop- 
ment. This analysis identified a 
human-specific gene-phenotype 
association related to speech 
and informs upon what genes 
can be studied in various model 
organisms. —LMZ 

Science, abe8457, this issue p. 522 


Aclean break for 
C-H bonds 


Carbon-hydrogen (C—H) bonds 
are ubiquitous in pharma- 
ceuticals and plastics but are 
difficult to transform. Fazekas 
et al. report a versatile reagent 
that strips hydrogen without 
immediately trapping the 
carbon. Heating or photoly- 
sis of the reagent produces a 
pair of radicals, one of which 
rapidly cleaves a C-H bond 
while the other remains com- 
paratively inert. A wide variety 
of other radical sources can 
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then intercede to form car- 
bon-halogen, carbon-carbon, 
and carbon-sulfur bonds. A 
two-step upcycling sequence 
that added imidazolium groups 
to postconsumer polyethylene 
foam produced a potentially 
valuable ionomer. —JSY 

Science, abh4308, this issue p.545 


Deriving primitive 
endoderm stem cells 


The mammalian blastocyst 
forms early in development and 
consists of three distinct cell 
types: epiblasts, trophoblasts, 
and primitive endoderm (PrE). 
Although stem cell lines that 
retain the functional properties 
of epiblasts or trophoblasts have 
been established, we lack stem 
cell lines that retain the develop- 
mental potential of PrE, which 
gives rise to extra-embryonic 
lineages that nourish the embryo 
and promote its development. 
Ohinata et al. report derivation 
of PrE stem cells that are able to 
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give rise to all extra-embryonic 
primitive endoderm tissues and 
support fetal development of PrE- 
depleted blastocysts in mouse 
chimeras. —BAP 

Science, aay3325, this issue p. 574 


Nighttime surprise 
lsoprene, which is emitted 
primarily by terrestrial veg- 
etation, is the most abundant 
volatile organic compound 
in Earth's atmosphere and is 
central to controlling the oxidiz- 
ing capacity of the troposphere 
and forming organic aero- 
sols. Palmer et al. report that 
nighttime concentrations of 
tropospheric isoprene are unex- 
pectedly high in much of the 
tropics. The authors link these 
anomalies to low concentra- 
tions of atmospheric nitrogen 
oxides and suggest that their 
findings will help to explain 
some observations of elevated 
levels of cloud condensation in 
the lower troposphere. —HJS 
Science, abg4506, this issue p. 562 


Feedback for breeding 
familiarity 

Social memory enables the 
recognition of others and the 
formation and maintenance of 
relationships and is partially 
supported by the hormone oxy- 
tocin. Wang et al. found that an 
oxytocin-receptor-dependent 
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positive feedback loop contrib- 
utes to long-term social memory 
in rodents. Reciprocal phos- 
phorylation between the receptor 
and the kinase PKD1 promoted 
downstream oxytocin recep- 
tor signaling in cultured cells. 
Rodents in which this loop was 
disrupted in the medial amygdala 
of the brain showed behav- 
iors and neuronal activity that 
indicated impaired recognition of 
familiar cage mates. —LKF 

Sci. Signal. 15, eabd0033 (2022). 


Sealing the deal 

Tissue sealants and adhesives 
are potentially useful alterna- 
tives to sutures for tissue repair, 
but application to wet tissue 
can be complex or take too long 
to set during surgery. Wu et al. 
developed a flexible, transpar- 
ent adhesive polymer hydrogel 
patch that seals gastric tissue 
defects. The patch could be 
applied to wet tissue and 
showed strong adhesion shortly 
after application and when fully 
swollen (6 hours after applica- 
tion). Patches sealed defects in 
rat colon, stomach, and small 
intestine, promoting tissue heal- 
ing and maintaining adhesion 
over 4 weeks. The technology 
could be scaled to seal defects 
in pig colon. Results support 
further investigation of this 
easy-to-apply patch as an alter- 
native to commercially available 
tissue adhesives. —CC 

Sci. Transl. Med. 14, eabh2857 (2022). 


A molecular feedback loop in neurons mediates social bonding in rodents. 
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ECOLOGICAL ECONOMICS 


_Edited.by Caroline Ast 
_ , and Jesse Smithy +: 


Protecting wetlands pays off 


he 47 million hectares of US wetlands provide an estimated 
$1.2 to 2.9 trillion in flood damage mitigation, benefits 
that can reach far downstream. Taylor and Druckenmiller 
integrate data on flood insurance claims, hydrography, 
land cover, and property values to show that wetland loss 
to development increased insurance claims. Wetland increase 
showed no impact, suggesting that conservation should be'pri- 
oritized over restoration. Wetlands that were 500 to 750 meters 
from the nearest stream or river were the most valuable, calling 
into question the 2020 removal of protections for “isolated” 
wetlands lacking a connection to surface water. —BW 
Am. Econ. Rev. https://www.aeaweb.org/articles?id=10.1257/ 


aer.20210497 (2021). 


Global peanut 
improvement 


Wild relatives of crop plants 
can supply genetic diversity 
useful for improving agricultural 
yields. In one such interaction 
six decades ago, the cultivated 
peanut was hybridized with 

a wild relative. The improved 
cultivar was resistant to certain 
diseases. Since then, those 
interested in crop improvement 
shared the peanuts through 
international networks. Bertiolia 
et al. track where the hybrids 
have been shared globally. This 
genetic and pedigree analysis 
finds traces of the improved 
peanut cultivar in Africa, Asia, 
Oceania, and the Americas. 
Thus, food security and agricul- 
tural sustainability have been 
aided both by the scientific 
access to diverse gene plasms 
and by the social network that 
shared the results. —PJH 


Proc. Natl. Acad. Sci. U.S.A. 18, 
€2104899118 (2021). 


Choosing simultaneously 
or sequentially 


Economic decisions are linked to 
neuronal activity in the orbito- 
frontal cortex. Neurons in this 
brain region represent different 
decision variables in a categorical 
way. For example, when animals 
choose between different types 
of juice drinks, different groups of 
neurons encode individual offer 
values into the binary decision 
and the chosen value. In most 
studies, two types of juices 
were presented simultaneously. 
However, in real life, choices often 
appear sequentially. Therefore, 
Shi et al. alternated trials under 
simultaneous and sequential 
offers of drinks. The authors 
found that the same neural 
circuits supported both types of 
choice sequence. Ideas about 
how economic choices are made 
can now be generalized to a 
broader domain of decisions than 
previously recognized. —PRS 

J. Neurosci. 42, 33 (2022). 
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Identifying deepfake 
videos 


Deepfake videos that have 
been digitally manipulated, 
from face swaps to filters, may 
look authentic to the untrained 
eye. As society debates the 
ethics of such technology, 
computer scientists are look- 
ing for ways to help humans 
and computer models better 
discern authentic videos 

from deepfakes. Groh et al. 
used data from more than 
15,000 human participants 

to show that humans are on 
average as accurate as the 
leading detection model at 
detecting deepfakes, but they 
make different mistakes. The 
researchers are hoping that the 
differences they have identi- 
fied can help in the design of 
future models that benefit from 
understanding the weaknesses 
and strengths of humans. —YY 


Proc. Natl. Acad. Sci. U.S.A. 119, 
e2110013119 (2022). 
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Repairing 

UV-damaged skin 
Sunburn and prematurely aged 
skin can be caused by exposure 
to high levels of ultraviolet (UV) 
radiation. Rognoni et al. exam- 


ined UV damage repair processes 
in human and mouse skin. The 


_ 


Wetland loss increases the 
cost of flooding and affects the 
resulting insurance claims. 


authors found that acute and 
chronic UV exposure selectively 
killed fibroblast cells in the 


papillary dermis layer of the skin. 


Live imaging and lineage trac- 
ing showed that acute damage 
was repaired by local papillary 
fibroblast proliferation with little 
need for fibroblast migration. 
Chronic UV exposure instead led 


Ultraviolet radiation selectively kills fibroblast cells in the dermis of the skin. 


to the recruitment and migration 
of neutrophils and T cells into the 
skin. These changes promoted 
fibroblast survival, migration, and 
damage repair. —SMH 

eLife, 10, e71052 (2021). 


Keeping the sweat off 
Hot weather requires unique 
strategies for cooling the human 
body. Recent advances have 
included improving radiative 
heat transfer through fabrics. 
Alternatively, fabrics that wick 
sweat can lower skin tem- 
perature. Peng et al. developed 
an integrated cooling textile 
(i-Cool) designed to efficiently 
wick and evaporate sweat. The 
authors accomplished this by 
integrating channels into the 
fabric that quickly move sweat 
from the skin to the surface. 
The fabric also has enhanced 
heat transport compared with 
cotton. The strategy may lead 
to other textiles that are more 
comfortable to wear in hot 
weather. —BG 

Nat. Commun. 12, 6122 (2021). 


Surviving white-nose 
syndrome 


White-nose syndrome, an emerg- 
ing infectious disease caused by 
the fungus Pseudogymnoascus 
destructans, has been devastating 
bat populations since its introduc- 
tion into North America in 2006. 
However, some bat populations 
have been able to rebound after 
the epidemic. Grimaudo et al. 
examined the roles of host traits 
and environmental conditions on 
white-nose syndrome severity 
and the persistence of remnant 
ittle brown bat (Myotis lucifugus) 
populations. Using a fully factorial 
translocation experiment, the 
authors found higher bat survival 
in established fungus-infected 
sites than occurred during initial 
epidemics, indicating that some 
evel of resistance emerged in 
the bats. However, this effect 
was highly dependent on local 
temperature and humidity condi- 
tions. —BEL 

Ecol. Lett. 10.1111/ele.13942 (2021). 
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CORONAVIRUS 
The unknowns of 
an antiviral strategy 


Broad antiviral drugs are 
needed to target RNA viruses, 
including severe acute respira- 
tory syndrome coronavirus-2 
(SARS-CoV-2), Zika virus, and 
Chikungunya virus, which have 
caused numerous epidem- 
ics. Lethal mutagenesis is an 
antiviral strategy whereby drugs 
form mutagenic ribonucleosides 
in host cells that are used in 
viral RNA genome replication, 
resulting in enough mutations 
to inactivate a replicating virus 
and thereby limit infection. The 
antiviral drug molnupiravir is 
designed to work in this manner 
and has recently been approved 
for the treatment of COVID-19. 
Ina Perspective, Swanstrom and 
Schinazi discuss the potential 
risks of this antiviral approach, 
including the possibility of 
producing variants and the 
potential for host DNA mutagen- 
esis. The potential for long-term 
effects suggests that safety 
assessments of mutagenic 
drugs should be examined more 
closely. —GKA 

Science, abn0048, this issue p. 497 


CHEMICAL POLLUTION 
Living with forever 


chemicals 

Per- and polyfluoroalkyl sub- 
stances (PFAS) are products of 
the modern chemical industry 
that have been enthusiastically 
incorporated into both essen- 
tial and convenience products. 
Such molecules, containing fully 
fluorine-substituted methyl or 
methylene groups, will persist 
on geologic time scales and can 
bioaccumulate to toxic levels. 
Evich et al. review the sources, 
transport, degradation, and 
toxicological implications of 
environmental PFAS. Despite 
their grouping together, these 
compounds are heterogeneous 
in chemical structure, properties, 
transformation pathways, and 


511-B 


biological effects. Remediation 
is possible but expensive and 
is complicated by dispersion in 
soil, water, and air. It is important 
that we thoroughly investigate 
the properties of potential 
replacements, many of which are 
merely different kinds of PFAS, 
and work to mitigate the harms 
of the most toxic forms already 
released. —MAF 

Science, abg9065, this issue p. 512 


IMMUNOLOGY 
Probing human T cell 
function using CRISPR 


CRISPR activation (CRISPRa) 
and CRISPR interference 
(CRISPRi) screens are powerful 
tools to test the gain and loss of 
gene function, but their use has 
largely been limited to immor- 
talized cell lines. Schmidt et al. 
report an optimized method 
that allowed them to perform 
genome-wide CRISPRa and 
CRISPRi screens on primary 
human T cells. This approach 
was then used to scrutinize 
genes regulating the production 
of key therapeutically relevant 
cytokines. The combination of 
pooled CRISPRa perturbations 
with single-cell RNA sequenc- 
ing (CRISPRa Perturb-seq) then 
allowed them to interrogate 
how the regulators of cytokine 
production can control T cell 
activation and programming into 
distinct postactivation states. 
—STS 

Science, abj4008, this issue p. 513 


PLANT SCIENCE 
A volatile defense 
against leafhoppers 


In established ecosystems, 
plants often fend off their 
insect attackers using chemi- 
cal defenses that are elicited by 
herbivore wounding. Some of 
these same insects are pests 

in the agricultural setting as 
well, attacking plants that have 
not benefited from chemical 
defenses evolved over the ages. 
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Bai et al. leveraged genetic 
diversity in a population of 
Nicotiana attenuata plants that 
they grew in the plant's native 
habitat in Arizona to study how 
their chemical defenses provide 
resistance to the herbivorous 
leafhoppers. A multi-omics 
approach led to the identifica- 
tion of a volatile compound from 
leaves that confers resistance to 
those leafhoppers. —PJH 
Science, abm2948, this issue p. 514 


MOLECULAR BIOLOGY 
Reassessment of DNA 
6mA in eukaryotes 


Certain forms of chemical 
modifications to DNA play 
important roles across the 
kingdoms of life; some forms 
have been widely studied and 
others are relatively new. DNA 
N®°-methyldeoxyadenosine 
(6mA), which was recently 
reported to be prevalent across 
eukaryotes, has created excite- 
ment as a target to study in 
biology and diseases. However, 
some studies have highlighted 
confounding factors, and there 
is an active debate over 6mA in 
eukaryotes. Kong et al. describe 
a method for quantitative 6mA 
deconvolution and report that 
bacterial contamination explains 
the vast majority of 6mA in 
DNA samples from insects and 
plants. The method also found 
no evidence for high 6mA levels 
in humans (see the Perspective 
by Boulias and Greer). This work 
advocates for a reassessment of 
6mA in eukaryotes and provides 
an actionable approach. —DJ 
Science, abe7489, this issue p. 515; 
see also abn6514, p. 494 


QUANTUM GASES 
Characterizing 
second sound 


Heat usually propagates 
diffusively, but it can also under 
certain circumstances propagate 
like a wave, much as sound does. 
This phenomenon, called second 


sound, has been observed in 
superfluids, including helium and 
ultracold atomic gases. However, 
measuring the attenuation of 
second sound remains tricky. Li 
et al. accomplished this feat by 
creating a uniform ultracold gas 
of strongly interacting fermionic 
lithium atoms with a very large 
Fermi energy. Placing the gas in 
an external periodic potential 
and measuring the response, the 
researchers extracted the coef- 
ficients characterizing second 
sound attenuation. —JS 

Science, abi4480, this issue p.528 


EMERGING COMPUTING 
Reconfigurable 
neuromorphic functions 


Having all the core functionality 
required for neuromorphic com- 
puting in one type of a device 
could offer dramatic improve- 
ments to emerging computing 
architectures and brain-inspired 
hardware for artificial intel- 
ligence. Zhang et al. showed 
that proton-doped perovskite 
neodymium nickelate (NdNiO.) 
could be reconfigured at room 
temperature by simple electrical 
pulses to generate the different 
functions of neuron, synapse, 
resistor, and capacitor (see 
the Perspective by John). The 
authors designed a prototype 
experimental network that not 
only demonstrated electrical 
reconfiguration of the device, but 
also showed that such dynamic 
networks enabled a better 
approximation of the dataset for 
incremental learning scenarios 
compared with static networks. 
—YS 

Science, abj7943, this issue p. 533; 

see also abn6196, p. 495 


HIV 


Evolving virulence in HIV 
Changes in viral load and CD4* T 
cell decline are expected signals 
of HIV evolution. By examining 
data from well-characterized 
European cohorts, Wymant et al. 
report an exceptionally virulent 
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subtype of HIV that has been 
circulating in the Netherlands 
for several years (see the 
Perspective by Wertheim). More 
than one hundred individuals 
infected with a characteristic 
subtype B lineage of HIV-1 
experienced double the rate of 
CD4* cell count declines than 
expected. By the time they were 
diagnosed, these individuals 
were vulnerable to develop- 
ing AIDS within 2 to 3 years. 
This virus lineage, which has 
apparently arisen de novo since 
around the millennium, shows 
extensive change across the 
genome affecting almost 300 
amino acids, which makes it dif- 
ficult to discern the mechanism 
for elevated virulence. —CA 
Science, abk1688, this issue p. 540; 
see also abn4887, p.493 


GREENHOUSE GASES 
Ultra smart 


Methane emissions from oil and 
gas production and transmission 
make a significant contribution 
to climate change. Lauvaux et 
al. used observations from the 
satellite platform TROPOMI to 
quantify very large releases of 
atmospheric methane by oil and 
gas industry ultra-emitters (see 
the Perspective by Vogel). They 
calculate that these sources rep- 
resent as much as 12% of global 
methane emissions from oil and 
gas production and transmission 
and note that mitigation of their 
emissions can be done at low 
cost. This would be an effective 
strategy to economically reduce 
the contribution of this industry 
to climate change. —HJS 
Science, abj4351, this issue p. 557; 
see also abm1676, p. 490 


GENE REGULATION 
Organization shapes 
expression 


The role of genome organiza- 
tion in the regulation of gene 
activity during development has 
been the subject of considerable 
controversy. Batut et a/. present 
evidence that dedicated “tether- 
ing elements” help to establish 
long-range enhancer-promoter 
interactions in the Drosophila 
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genome (see the Perspective by 
Gaskill and Harrison). Single-cell 
imaging of transcription in living 
embryos showed the importance 
of these elements in determining 
the timing of Hox gene activation 
during development. Tethers 
operate independently of bound- 
ary elements, which mediate the 
opposite function of blocking 
spurious regulatory interac- 
tions between neighboring loci. 
This work sheds light on how 
genome organization controls 
the dynamics of gene expression 
underlying complex develop- 
mental processes. —BAP 

Science, abi7178, this issue p. 566; 

see also abn6380, p. 491 


T CELLS 
Skin dwellers for 
the long haul 


Transplantation with allogeneic 
hematopoietic stem cells after 
myeloablative conditioning 
enables near total replacement 
of host blood cells by donor 
cells. To ascertain whether 
skin-resident memory T cells are 
also replaced by donor T cells 
after therapeutic hematopoietic 
transplantation, de Almeida et 
al. used single-cell chimerism 
analysis of patient blood and 
skin T cells at multiple post- 
transplantation time points. 
Long-term chimerism of host 
T cells in skin was observed in 
23% of patients. These patients 
retained a small number of host 
T cells in blood with features of 
tissue-resident lymphocytes, 
suggesting mobilization into the 
circulation after tissue residency. 
These studies open the door 
to learning more about tissue- 
resident humanT cells through 
the analysis of patients with 
long-term chimerism of host 
skin T cells after hematopoietic 
transplantation. —IRW 

Sci. Immunol. 7,eabe2634 (2022). 
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CHEMICAL POLLUTION 


Per- and polyfluoroalkyl substances 


in the environment 


Marina G. Evich{, Mary J. B. Davist, James P. McCord{, Brad Acrey, Jill A. Awkerman, Detlef R. U. Knappe, 
Andrew B. Lindstrom, Thomas F. Speth, Caroline Tebes-Stevens, Mark J. Strynar, Zhanyun Wang, 
Eric J. Weber, W. Matthew Henderson*, John W. Washington* 


BACKGROUND: Dubbed “forever chemicals” be- 
cause of their innate chemical stability, per- 
and polyfluoroalkyl substances (PFAS) have 
been found to be ubiquitous environmental 
contaminants, present from the far Arctic 
reaches of the planet to urban rainwater. Al- 
though public awareness of these compounds 
is still relatively new, PFAS have been manu- 
factured for more than seven decades. Over that 
time, industrial uses of PFAS have extended 
to >200 diverse applications of >1400 indi- 
vidual PFAS, including fast-food containers, 
anti-staining fabrics, and fire-suppressing 
foams. These numerous applications are pos- 
sible and continue to expand because the 
rapidly broadening development and manu- 
facture of PFAS is creating a physiochemically 
diverse class of thousands of unique syn- 
thetic chemicals that are related by their use 
of highly stable perfluorinated carbon chains. 
As these products flow through their life 
cycle from production to disposal, PFAS can 
be released into the environment at each step 


Atmospheric 
release 


Solid waste stream 


Commercial 
users 


ms, laa) —- 
®) 


Primary 
producer 


Biosolids on 
agricultural fields 


Occupational/ 
household users 


Lu 


Aqueous waste stream _ 


and potentially be taken up by biota, but 
largely migrating to the oceans and marine 
sediments in the long term. Bioaccumulation 
in both aquatic and terrestrial species has 
been widely observed, and while large-scale 
monitoring studies have been implemented, 
the adverse outcomes to ecological and hu- 
man health, particularly of replacement PFAS, 
remain largely unknown. Critically, because 
of the sheer number of PFAS, environmental 
discovery and characterization studies strug- 
gle to keep pace with the development and 
release of next-generation compounds. The 
rapid expansion of PFAS, combined with their 
complex environmental interactions, results 
in a patchwork of data. Whereas the oldest 
legacy compounds such as perfluoroalkyl- 
carboxylic (PFCAs) and perfluoroalkanesul- 
fonic (PFSAs) have known health impacts, 
more recently developed PFAS are poorly 
characterized, and many PFAS even lack de- 
fined chemical structures, much less known 
toxicological end points. 


Wet and dry. 
deposition 


Incinerator 


Wastewater 


The PFAS life cycle. PFAS product flows from primary producer to commercial user to consumers to disposal. 
Each step is attended by atmospheric and aqueous fugitive releases. Soils constitute a long-term environmental 
sink, slowly releasing PFAS to the hydrosphere and allowing uptake in biota, but the ultimate reservoir is deep 


marine sediment. 
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ADVANCES: Continued measurement of legacy 
and next-generation PFAS is critical to assess- 
ing their behavior in environmental matrices 
and improving our understanding of their fate 
and transport. Studies of well-characterized 
legacy compounds, such as PFCAs and PFSAs, 
aid in the elucidation of interactions between 
PFAS chemistries and realistic environmental 
heterogeneities (e.g., pH, temperature, min- 
eral assemblages, and co-contaminants). How- 
ever, the reliability of resulting predictions 
depends on the degree of similarity between 
the legacy and new compounds. Atmospheric 
transport has been shown to play an impor- 
tant role in global PFAS distribution and, after 
deposition, mobility within terrestrial settings 
decreases with increasing molecular weight, 
whereas bioaccumulation increases. PFAS de- 
gradation rates within anaerobic settings and 
within marine sediments sharply contrast those 
within aerobic soils, resulting in considerable 
variation in biotransformation potential and 
major terminal products in settings such as 
landfills, oceans, or soils. However, regardless 
of the degradation pathway, natural transforma- 
tion of labile PFAS includes PFAS reaction 
products, resulting in deposition sites such as 
landfills serving as time-delayed sources. Thus, 
PFAS require more drastic, destructive reme- 
diation processes for contaminated matrices, 
including treatment of residuals such as granular 
activated carbon from drinking water reme- 
diation. Destructive thermal and nonthermal 
processes for PFAS are being piloted, but there 
is always a risk of forming yet more PFAS 
products by incomplete destruction. 


OUTLOOK: Although great strides have been 
taken in recent decades in understanding the 
fate, mobility, toxicity, and remediation of PFAS, 
there are still considerable management con- 
cerns across the life cycle of these persistent 
chemicals. The study of emerging compounds 
is complicated by the confidential nature of 
many PFAS chemistries, manufacturing pro- 
cesses, industrial by-products, and applications. 
Furthermore, the diversity and complexity of 
affected media are difficult to capture in lab- 
oratory studies. Unquestionably, it remains a 
priority for environmental scientists to under- 
stand behavior trends of PFAS and to work 
collaboratively with global regulatory agencies 
and industry toward effective environmental 
exposure mitigation strategies. 


The list of author affiliations is available in the full article online. 
*Corresponding author. Email: henderson.matt@epa.gov 
(W.M.H.); washington.john@epa.gov (J.W.W.) 
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Over the past several years, the term PFAS (per- and polyfluoroalkyl substances) has grown to be 
emblematic of environmental contamination, garnering public, scientific, and regulatory concern. PFAS 
are synthesized by two processes, direct fluorination (e.g., electrochemical fluorination) and 
oligomerization (e.g., fluorotelomerization). More than a megatonne of PFAS is produced yearly, and 
thousands of PFAS wind up in end-use products. Atmospheric and aqueous fugitive releases during 
manufacturing, use, and disposal have resulted in the global distribution of these compounds. Volatile 
PFAS facilitate long-range transport, commonly followed by complex transformation schemes to 
recalcitrant terminal PFAS, which do not degrade under environmental conditions and thus migrate 
through the environment and accumulate in biota through multiple pathways. Efforts to remediate PFAS- 
contaminated matrices still are in their infancy, with much current research targeting drinking water. 


he ubiquitous presence of per- and poly- 

fluoroalkyl substances (PFAS) in the 

environment after decades of manufac- 

turing and consumer use (Fig. 1) has 

garnered global interest, with an ever- 
expanding inventory of >14.00 individual chem- 
icals in the Toxic Substances Control Act 
Inventory and >8000 unique known struc- 
tures (J). PFAS have been incorporated in 
>200 use areas ranging from industrial- 
mining applications to food production and 
fire-fighting foams because of the innate 
chemical and thermal stability of the carbon- 
fluorine bond and ability to repel oil and water 
(2). As PFAS flow through commerce from 
primary manufacturer to commercial user 
to final disposal, environmental release oc- 
curs through both controlled and fugitive 
waste streams. The stability of many PFAS 
degradants fosters their ubiquity in the en- 
vironment. The growing number of PFAS 
susceptible to partial degradation (3) further 
complicates environmental fingerprinting and 
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remediation efforts. Whereas some PFAS trans- 
formation pathways have been well charac- 
terized, others degrade through as-yet unknown 
pathways, expanding the already immense 
PFAS inventory by untold numbers. Of the 
known PFAS, there is a paucity of data ad- 
equately describing potential impacts to eco- 
systems and their provisioning services, and 
few of these chemicals are well characterized 
by ecotoxicity studies, with the widely known 
perfluorooctanoic acid (PFOA) and perfluoro- 
octane sulfonic acid (PFOS) alone covering 
21 and 39% of the ECOTOX Knowledgebase 
(4), respectively. Furthermore, with their 
detection in sera across the human popula- 
tion, coupled with epidemiological evidence of 
the health impacts for legacy PFAS (5, 6), in- 
formation on associations with human disease 
for emerging PFAS is needed. With global 
production volumes of fluoropolymers surpas- 
sing 230,000 tonnes/year (2) and estimated 
cumulative global emissions of perfluoroalkyl 
acids totaling =46,000 tonnes (7), scientists 
struggle to keep pace with manufacturing, use 
(Fig. 1), and subsequent release. Here, we sum- 
marize central concerns in PFAS production, 
persistence, environmental mobility, exposure, 
and remediation to inform the international 
community. 


Major PFAS groups and uses 


PFAS are a class of substances within a wide 
universe of organofluorine compounds (8), as 
first laid out by Buck et al. in 2011 (9). In 2021, 
the Organisation for Economic Cooperation 
and Development released a revised definition 
of PFAS, “PFAS are fluorinated substances 
that contain at least one fully fluorinated 
methyl or methylene carbon atom (without 
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any H/Cl/Br/I atom attached to it)” (10). This 
revised definition is more inclusive with un- 
ambiguous inclusion of PFAS such as side- 
chain fluorinated aromatics (Fig. 2) (JI, 12). 
By contrast, most historical work within the 
research community has focused on a small 
set of perfluoroalkyl(ether) acids and their 
precursors, with an emphasis on environ- 
mental and biological occurrence investiga- 
tions. Whereas the persistence associated with 
the perfluorinated-carbon chain is a funda- 
mental underlying concern, PFAS also have a 
wide range of bioaccumulation and adverse- 
effect concerns, governed by their varied physio- 
chemical properties. 

Although industrial reviews include general 
synthetic routes and major applications of 
some PFAS groups (J3), inadequate public 
information exists for many PFAS interna- 
tionally, particularly those currently in use, be- 
cause of confidential business information 
claims and insufficient regulatory structures 
(14-16). Critical data gaps include PFAS iden- 
tities, locations and quantities of production 
and processing, and final uses of products, 
limiting the capability to identify where envi- 
ronmental and human exposure occur. Here, 
we summarize synthetic routes, structural traits, 
and uses of the major PFAS groups (Figs. 1 and 
2) and describe implications and knowledge 
gaps for future research and action. 

The fluorine in PFAS is mined from fluorite 
(CaF,) mineral deposits, which is digested to 
form hydrofluoric acid (HF) (Fig. 1). HF and 
other non-PFAS-based chemicals are used in 
either of two general synthetic techniques to 
produce starting materials (e.g., perfluoro- 
alkanoy] fluorides in Fig. 2) of individual PFAS 
groups, namely direct fluorination (i.e., turn- 
ing nonfluorinated to fluorinated substances; 
e.g., electrochemical fluorination) and oligo- 
merization (i.e., converting monomers to larger 
molecules; e.g., fluorotelomerization). Direct 
fluorination is aggressive and often results in 
uncontrolled chemical reactions such as car- 
bon chain shortening and rearrangement 
(17-19), leading to a wide range of by-products 
including cyclic and branched isomers. Oligo- 
merization is less aggressive and mainly results 
in a homologous series of target compounds 
(9), as have been observed near fluoropolymer 
(20) and perfluoropolyether (27) manufactur- 
ing and processing sites. Within individual 
PFAS groups, the functional moieties of start- 
ing materials may further react following 
conventional reaction pathways to yield dif- 
ferent PFAS (9); thus, depending on the com- 
plexity of synthetic routes, final products may 
contain a number of unreacted intermediates 
and degradation products (22, 23). Whereas 
the summary below focuses on target and/or 
intentional PFAS, these unintentional PFAS 
can constitute an important part of human 
and environmental exposure and merit scrutiny. 
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Fig. 1. Non-exhaustive summary of PFAS manufacturing, from production to consumer use. Numerous product fluxes are reasonably documented, but 
considerable lacunae remain. See text for details and citations. HFC, hydrofluorocarbon; HCFO, hydrochlorofluoroolefin; HFO, hydrofluoroolefin; HFE, hydrofluoroether; 


PASF, perfluoroalkanesulfony! fluoride. 


Major PFAS groups from direct fluorination 
include those hydrofluorocarbons, hydrofluoro- 
ethers, hydrochlorofluoroolefins, and hydro- 
fluoroolefins that contain a -CF; moiety and 
have an overall global production of >1 
megatonne/year (24). Including a range of 
low-molecular-weight and low-boiling-point 
compounds that are used as refrigerants, heat- 
transfer fluids, solvents, and foaming agents 
(2, 24), these compounds replaced ozone- 
depleting chlorofluorocarbons and hydro- 
chlorofluorocarbons. Because of their high 
global-warming potential, the international 
community has agreed to phase down and 
eventually eliminate hydrofluorocarbons (25, 26). 
An ongoing industrial transition is taking 
place, including increasing large-scale replace- 
ment of hydrofluorocarbons with hydro- 
fluoroethers and hydrofluoroolefins. Although 
they have low global-warming potentials, 
hydrofluoroethers and hydrofluoroolefins 
can ultimately degrade to highly persistent 
perfluoroalkylcarboxylic acids (PFCAs) such as 
trifluoroacetate, and a steep accumulation of 
trifluoroacetate in the environment is becom- 
ing increasingly evident (27). 

Another important PFAS group resulting 
from direct fluorination is side-chain fluori- 
nated aromatics (11, 12), with unknown but 
likely considerable amounts being produced 
and used annually. A common starting point 
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is the synthesis of benzotrifluorides from 
benzotrichlorides by reaction with HF (8). 
Addition of the -CF; moiety can reduce biol- 
ogical degradation, increase biological ac- 
tivity, and assist with membrane transport, 
making the parent compound longer lasting 
or more effective; therefore, many side-chain 
fluorinated aromatics are used in pharmaceu- 
tical (12) or agricultural (77) applications. These 
substances can also degrade to PFCAs such as 
trifluoroacetate. 

Two other major PFAS groups produced 
from direct fluorination include perfluoroalkyl- 
tert-amines (28) and perfluoroalkanoyl/ 
perfluoroalkanesulfonyl fluorides (PACF/ 
PASFs), which are further reacted to produce 
PFCAs, perfluoroalkanesulfonates (PFSAs), and 
other derivatives (Fig. 2). Historically, hundreds 
of PACF/PASF-based derivatives with a wide 
range of perfluorocarbon-chain lengths were 
produced, on the order of kilotonnes/year 
(5, 29), and used for industrial and consumer 
applications (2). Since the early 2000s, num- 
erous long-chain (fluoroalkyl carbon num- 
ber =6) PACF/PASF-based derivatives have 
been—and are being—phased out because of 
widespread concern, whereas shorter-chain 
PACF/PASF-based derivatives still are being 
produced and widely used, although in un- 
known amounts (/5, 29). In the environment 
and biota, PACF/PASF-based derivatives may 
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degrade and partially transform into different 
PFCAs and/or PFSAs. 

On the oligomerization side, two major PFAS 
groups are fluoropolymers and perfluoropoly- 
ethers. These are high-production polymers 
having fluorinated backbones, with fluoro- 
polymers being produced on the scale of 
100 kilotonnes/year and unknown but likely 
considerable amounts for perfluoropolyethers. 
Despite often having simple names such as 
polytetrafluoroethylene, substances in these 
two groups can be highly diverse, including 
both nonfunctionalized (with -CF3) and func- 
tionalized termini, with different structural 
combinations and molar ratios of monomers 
(for copolymers), and from low (< 1000 Da) to 
very high (> 100,000 Da) molecular weight 
(30-32); this complexity has not been clearly 
communicated with a comprehensive over- 
view of different fluoropolymers and perfluoro- 
polyethers on the market. Depending on 
structure, different fluoropolymers and per- 
fluoropolyethers can be used in a range of 
industrial and consumer applications (2); in 
some applications, perfluoropolyethers are 
used as alternatives to PACF/PASF-based 
derivatives. Given their variety and complex- 
ity, their subsequent bioavailability and de- 
gradability are highly variable and complex, 
which is generally overlooked, understudied, 
and/or unknown. 
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Three other major PFAS groups formed 
from oligomerization are fluorotelomers, per- 
fluoroalkyl(ether) carboxylic and sulfonic acids, 
and perfluoroalkene derivatives. Fluorotelo- 
mers share many similarities to PACF/PASF- 
based derivatives other than perfluoroalkyl 
(ether) acids, including molecular structures, 
degradability (9, 23, 29), use applications (2), 
and manufacturing trends from a wide range of 
perfluorocarbon chain lengths to predominant- 
ly shorter chains. Fluorotelomers were histor- 
ically produced on the order of 9 kilotonnes/ 
year (33), with the current amounts produced 
unknown. Unknown amounts of perfluoro- 
alkyl(ether) carboxylic and sulfonic acids 
are being used to replace long-chain PFCAs 
and PFSAs (34) in industrial applications 
such as fluoropolymer production and metal 
plating, respectively. Perfluoroalkene deriv- 
atives such as p-perfluorous nonenoxyben- 
zene sulfonate have been produced since the 
1980s; large-scale production (on the scale of 
kilotonnes/year) was recently initiated in China 
as an alternative to PFOS in firefighting and oil 
production (35). Despite having an unsaturated 
bond, p-perfluorous nonenoxybenzene sulfo- 
nate is not readily biodegradable (36). 


Environmental stability, degradation schemes, 
and transformation rates 


Despite typically having high stability as a 
group, ~20% of PFAS may undergo transfor- 
mation in the environment (3). These labile 
compounds are precursors to recalcitrant, ter- 
minal transformation products such as PFCAs 
and PFSAs. For example, frequently detected 
precursors including perfluorooctane sulfona- 
mides, fluorotelomer alcohols (FTOHs), and 
fluorotelomer sulfonates, have been found to 
contribute up to 86% of total PFAS identified 
in wastewater-treatment plant sludge (37). 
Although PFAS can undergo complete de- 
gradation to inorganic components using high- 
energy remediation technologies, precursor 
transformations under environmental condi- 
tions, including processes such as hydrolysis 
(38), oxidation (39, 40), reduction, decarboxyl- 
ation and hydroxylation (47), ultimately yield 
stable PFAS. Despite the low vapor pressure 
and high water solubilities of many PFAS, 
some conditions (e.g., within industrial stacks) 
can promote partitioning to air through par- 
ticulate sorption, and volatile PFAS such as 
FTOHs can exist in the gas phase (42), making 
atmospheric and photochemical transforma- 
tion possible. In the soil-water environment, 
microbe-facilitated functional group biotrans- 
formation can occur aerobically (43, 44) or 
anaerobically (45-47), and some microbes that 
carry out these reactions have been identified 
(46, 48, 49). Biotransformation of labile PFAS 
also can be mediated by plant-specific en- 
zymes. For example, microbial transformation 
of 8:2 FTOH was substantially enhanced with 
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the addition of soybean root exudates in solu- 
tion (50), and perfluorooctane sulfonamide 
was transformed in the presence of carrot 
and lettuce crops, but not in their absence, in 
amended soils (57). In both studies, enhanced 
degradation was attributed to the organic car- 
bon content of the soil, because the addition of 
carbon sources can increase microbial degrada- 
tion rates through co-metabolic processes (52). 

Several PFAS can undergo transformation, 
resulting in the formation of FTOHs through 
processes such oxidation, reduction (53), de- 
sulfonation (54), and hydrolysis (38, 55-58) 
(Fig. 3A). Although some fluorotelomers evi- 
dently transform without forming intermediate 
FTOHs (9, 22, 49, 59), one of the archetypal 
“legacy PFAS” transformation schemes involves 
FTOHs that are subject to (bio)transformation 
through numerous intermediates, leading to 
the formation of terminal PFCA through 
chain-shortening processes (Fig. 3A). The ef- 
ficiency of these transformations decreases 
from aerobic to anoxic to anaerobic (60, 61) 
conditions, and PFCA yields and rates of for- 
mation depend on specific precursor and trans- 
formation conditions (9). On average, PFOA 
yields from 8:2 FTOH were reported to be 25% 
in aerobic soils compared with <1% in an- 
aerobic sludge (62). This process is initiated 
by the oxidation of 8:2 FTOH to yield the in- 
ferred 8:2 fluorotelomer aldehyde and then the 
8:2 fluorotelomer carboxylic acid, which is re- 
duced through the loss of F to form 7:3 unsat- 
urated fluorotelomer acid, which can form the 
terminal acid perfluorohexanoic acid (53, 63, 64) 
(Fig. 3A). A key step in the pathway is hydro- 
xylation in the B position and subsequent oxi- 
dation to form the 7:3 3(keto) fluorotelomer 
carboxylic acid, which then undergoes f-oxidation 
to form PFOA, as well as o-decarboxylation to 
form the 7:2 ketone (53, 63, 64). The ketone 
then is reduced to form the secondary alcohol, 
1-perfluoroheptyl ethanol [also known as 7:2 
(sec) FTOH], which is oxidized to form PFOA 
(53, 63, 64). 

In a second major transformation scheme, 
N-ethyl] perfluorooctane sulfonamido ethanol 
is proposed to oxidize to form the aldehyde 
and subsequently to N-ethyl perfluorooc- 
tane sulfonamidoacetic (Fig. 3B) (65, 66). 
N-deacetylation of N-ethyl perfluorooctane 
sulfonamidoacetatic acid then leads to the for- 
mation of N-ethyl perfluorooctane sulfonamide 
followed by C-hydroxylation to form perfluoro- 
octane sulfonamido ethanol. Oxidation of 
perfluorooctane sulfonamido ethanol to per- 
fluorooctane sulfonamido acetic acid is pro- 
posed to occur through the perfluorooctane 
sulfonamide aldehyde. N-deacetylation of per- 
fluorooctane sulfonamido acetic acid to form 
perfluorooctane sulfonamide is then observed. 
Perfluorooctane sulfonamide may also form 
directly from the N-dealkylation of N-ethyl 
perfluorooctane sulfonamide (65, 66). Deami- 
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nation of perfluorooctane sulfonamide to form 
perfluorooctane sulfinic acid is commonly fol- 
lowed by oxidation to form the terminal pro- 
duct, PFOS. 

PFAS transformation under environmental 
conditions can be approximated using first- 
order kinetics (67). Environmental degrada- 
tion of labile precursors is observed to occur in 
a “tree structure,” with the formation of num- 
erous intermediates along branching transfor- 
mation pathways (53, 68). Along each branch, 
the formation and disappearance of interme- 
diates can be modeled as a sequential decay 
chain (23), with each step characterized by a 
pseudo first-order rate constant (67). 

In soils and sediment, sorption can slow 
the observed rate of microbial transformation 
(69). With long-chain PFAS preferentially ad- 
sorbing to soil phases, molecular weight can 
be used as an approximate indicator of relative 
stability among PFAS sharing common reac- 
tion centers (43). To address the effects of re- 
versible sorption, some have proposed use of 
a double-first-order, in-parallel model (67), 
wherein rate-limited reversible sorption is in- 
cluded as a first-order process. 

In addition to sorption, transformation rate 
is dependent on a number of other environ- 
mental factors including pH, temperature, and 
microbial population (70), and these factors 
contribute to a wide variation of reported pre- 
cursor half-lives. For example, biodegradation 
studies of N-ethyl perfluorooctane sulfona- 
mido ethanol in sludge reported a half-life 
of 0.7 to 4.2 days, yet the biodegradation in 
marine sediments was found to proceed at 
much slower rates (f2, 4c = 160 days and to, 
asec = 44 days), which could explain reports 
of elevated concentrations of N-ethyl per- 
fluorooctane sulfonamido ethanol in marine 
environments (66). Similarly, the anaerobic 
biotransformations of 6:2 and 8:2 FTOHs 
slowed substantially (30 and 145 days, respec- 
tively) compared with aerobic conditions (<2 
and 2 to 7 days, respectively) (62), which can 
foster enhanced levels of telomer acids [e.g., 
5:3 fluorotelomer carboxylic acid by hydro- 
genation of the 5:3 fluorotelomer unsaturated 
carboxylic acid (53)] in landfills (77). Therefore, 
PFAS that typically are intermediates in ox- 
idizing settings may exist as terminal products 
under reducing conditions. For example, var- 
iations in PFAS species detected in leachate 
from waste collection vehicles compared with 
landfill leachate suggest alternative biodeg- 
radation pathways in long-term anaerobic 
settings such as landfills (72). Consequently, 
degradation studies conducted under con- 
trolled conditions result in considerable var- 
iation in biotransformation potential and 
possibly different major stable perfluorinated 
degradation products when extrapolating half- 
lives and major products from laboratory to 
environmental conditions. 
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In addition to accounting for environmental | tions of organic solvents have been shown to | when comparing cultures spiked with PFOA or 
conditions (67), another complicating factor is | inhibit PFOA degradation under in situ re- | PFOS against microbial compositions without 
that contaminants commonly exist as compo- | medial chemical oxidation studies, suggest- | PFAS (46). Considering that PFAS environ- 
nents in complex mixtures. One common pre- | ing that interactions of PFAS with other | mental transformation is mediated primarily 
cursor source is aqueous-film-forming foam | non-PFAS co-contaminants can alter PFAS | by microbes, data suggest that the presence of 
(AFFF), formulations of which contain mix- | transformation (40). Additionally, the pres- | complex mixtures could indirectly alter bio- 
tures of PFAS, and co-contaminants such as | ence of different PFAS has resulted in chang- | degradation and that the presence of one PFAS 
nonfluorinated surfactants. High concentra- | ing compositions of microbial communities | may affect the transformation rate of another, 
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Fig. 3. Breakdown pathways of classes of PFAS. Shown are reaction schemes for 8:2 FTOH (47, 53, 63) (A) and N-EtFOSE (65, 66) (B). Transformation products 
proposed by the original investigators are shown with brackets. 
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although transformation kinetics of PFAS mix- 
tures has not been reported. Furthermore, 
these complex mixtures could have down- 
stream implications for PFAS mobility, because 
co-contaminants in AFFF mixtures affect mi- 
crobial toxicity and PFAS solubility, parti- 
tioning (73), and remediation [PFAS can be 
transformed during treatment of organic con- 
taminants (39)]. 

Taken together, the complexity of real-world 
environmental conditions acting on primary 
precursors, intermediates and terminal pro- 
ducts can result in divergence from reaction 
schemes and degradation rates derived under 
laboratory conditions. These complexities are 
aggravated by the many experimental chal- 
lenges associated with larger PFAS such 
as fluoropolymers and side-chain fluorinated 
polymers, the structure and monomeric com- 
positions of which often are not completely 
characterized (23, 38, 74). In addition, there 
remain uncertainties regarding the levels of 
impurities or synthetic by-products and life 
cycle emissions of these polymers, which may 
affect degradation rates, further necessitat- 
ing nontargeted analyses in conjunction with 
transformation prediction simulators such as 
EnviPath (75) and the Chemical Transforma- 
tion Simulator (76) to identify new PFAS and 
transformation products in the environment. 


Environmental mobility and distribution 


The mobility of PFAS in the environment is 
dictated by properties of the mobile (usually 
air and water) and immobile phases [e.g., nat- 
ural organic matter (NOM) and mineral as- 
semblages] as well as the PFAS species. The 
transformation rates discussed above affect 
the time available for migration. When trans- 
formation rates of short-lived intermediates 
exceed environmental transport rates, these 
intermediates can remain proximate to their 
precursors, a phenomenon well established 
for the environmental distribution of short- 
lived radionuclides (77) because of secular 
(radio-decay) equilibrium with long-lived 
parents (78). Further, this secular equilibrium 
of short-lived intermediates might contribute 
to the undetectable status of some inferred 
compounds (e.g., 2-perfluorooctyl acetalde- 
hyde; Fig. 3). For PFAS with intermediate 
transformation rates (e.g., FTOHs and fluoro- 
telomer unsaturated carboxylic acids; Fig. 3) 
relative to environmental transport processes, 
these compounds can migrate considerable 
distances before transformation to recalcitrant 
PFAS, thereby dispersing widely in the envi- 
ronment (79). 

Early precursor PFAS include volatile spe- 
cies (FTOHs and sulfonamido ethanols; Fig. 3), 
the presence of which has been established 
globally (80-82). Atmospheric residence time 
governs transport distance (83) and depends 
on a variety of PFAS properties, including 
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volatility, reactivity, molecular weight, and 
vapor-particulate partitioning (82, 84, 85). 
Atmospheric lifetimes have been reported for 
FTOHs of ~20 days (86). Consistent with these 
atmospheric lifetimes, air samples collected at 
remote oceanic locations are reported to con- 
tain several FTOH and/or perfluorosulfonamido 
ethanol species in both gas and particulate 
phases (80). On the basis of these and related 
observations, a large portion of PFAS global 
distribution, including that to remote regions, 
has been attributed to atmospheric transport 
(79, 87). For example, in a study of soils col- 
lected from remote sites globally, all samples 
contained PFAS, with homolog ratios [e.g., 
PFOA/perfluorononanoic acid (PFNA)] con- 
sistent with atmospheric transport (79). These 
soil concentrations have been used to define 
global-background PFAS ranges in surface 
soils (means ~10 to 60 pg/g), such that surface 
soils rarely contain lower PFAS, and higher 
concentrations suggest local or regional sources 
(88). Atmospherically transported ionic PFAS 
also have been shown to disperse widely, per- 
haps as far afield as >400 km (27, 89, 90), al- 
though the form of these species, e.g., free acid, 
dissolved in droplets or sorbed to particulates, 
has not been resolved. 

In terrestrial settings, PFAS transport usu- 
ally occurs through aqueous advection, with 
migration retarded by sorption on NOM, min- 
erals, and at fluid-fluid interfaces (particu- 
larly air-water) (97). Most PFAS sorption studies 
have been conducted with surface soils in 
which NOM, which is typically present at 
relatively high concentrations (Fig. 4) (92), con- 
stitutes a major substrate. Exploring surface- 
soil sorption mechanisms of two PFAS having 
sulfonate termini revealed an easily extract- 
able fraction, as well as less reversibly sorbed 
fractions composed of perfluoroalkyl groups 
hydrophobically associating with NOM, sul- 
fonate moieties covalently binding to NOM-OH 
groups forming ester linkages, and physical 
entrapment in NOM or minerals (93). Com- 
paring the sorption of cationic, zwitterionic, 
and anionic PFAS showed concentration- 
dependent sorption for cationic and zwitter- 
ionic PFAS, pronounced sorption hysteresis 
for zwitterions, and major electrostatic and 
NOM sorption for cationic and zwitterionic 
PFAS (94). 

The high NOM concentrations of surface 
soils typically diminish precipitously in the 
first several centimeters below the ground sur- 
face, where mineral surfaces come to domi- 
nate the vertically more expansive subsurface 
realm (Fig. 4) (92). Authigenic minerals typ- 
ically are abundant in the subsurface, and 
these minerals have surface charges for elec- 
trostatic sorption. Aluminosilicate clays bear 
permanent negative surface charges, pre- 
senting potential sorption sites for cationic and 
zwitterionic PFAS. Ferric and aluminum 
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(oxy)hydroxides bear pH-dependent, positive 
surface charges below their zero point of charge 
at a pH of ~8, so these minerals can electro- 
statically sorb anionic PFAS. In the vadose 
zone, recent studies have shown that the sur- 
factant nature of PFAS also fosters sorption at 
the air-water interface, retarding PFAS migra- 
tion (97). 

To assess sorption across a wide breadth of 
PFAS species and complex sorption matrices, 
experiments have been performed on 29 PFAS 
in 10 soils (95). This study concluded that a 
simple distribution coefficient, Kg (soil/water 
concentration), effectively characterized rela- 
tive distribution among PFAS. Recognizing 
that lower values of log Kg favor partitioning 
to water, thereby favoring higher environ- 
mental mobility, general patterns in these 
data (Fig. 4A) include the following: (i) the 
distribution coefficient increases logarithmi- 
cally with fluoroalkyl carbon numbers >5, (ii) 
distribution coefficients converge to similar 
values among PFAS species and chain-lengths 
having fluorinated carbons <5, and (iii) for 
equal fluoroalkyl carbon numbers, sorption 
generally decreases according to zwitterions 
> sulfonamides > telomers > PFSAs > PFCAs 
> ethers. It also was observed that log Kg for 
anionic PFAS increased with decreasing pH, a 
pattern consistent with increasing positive elec- 
trostatic charge on pH-dependent surfaces of 
(oxy)hydroxide minerals and amorphous solids. 

When precursor degradation does not com- 
plicate interpretation (96), relative values of 
log Kg are reflected in PFAS distribution pat- 
terns across the spectrum of environmental 
settings. Figure 4B depicts geometric mean 
ratios (subsoil/surface soil) of PFAS for three 
soil profiles after biosolids application at the 
ground surface (97); consistent with log Kg 
values, subsoil accumulation of PFCAs exceeds 
PFSAs for the common fluoroalkyl number 8, 
shorter chains vary little from each other, and 
shorter chains exceeds that of longer chains. 
It is noteworthy that subsoil accumulation for 
fluoroalkyl number >10 also varies little with 
chain length, perhaps reflecting facilitated 
transport of PFAS sorbed to colloids winnow- 
ing through the soil column (98). 

Transport of PFAS into terrestrial plants 
occurs through a variety of pathways, with the 
most studied being uptake through roots. As 
with transport in soils, vegetative accumula- 
tion factors (VAF = [PFAS ]vegetation/ [PFAS ]eoit) 
are influenced by the propensity of specific 
PFAS to partition into water as they are trans- 
ported through plants. These VAFs have re- 
vealed plant species- and tissue-specific trends 
(99-101). However, a recent review of VAFs 
across numerous species and tissues reported 
uniformly declining trends in total VAF with 
increasing fluoroalkyl number for PFCAs and 
PFSAs (102) (Fig. 4C) (101). VAF trends with 
chain length and among terminal moieties 
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Fig. 4. PFAS partitioning in environmental media (log Ky). The environmental 
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terminal moiety [(A) (95); pH = 5.2 values depicted]. Because of this partitioning 
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(D) (105), (E) (106)], and terrestrial vegetation accumulation diminishes with 
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increases with fluoroalkyl number [(C) (101)]. In aquatic settings, vegetative and 
detrital-feeder accumulation both increase with fluoroalkyl number [(F) (107)]. 
CEC/AEC, cation-exchange capacity/anion-exchange capacity. 


suggest that chemical properties of PFAS also 
exert a strong influence over plant uptake. Re- 
ports of plant uptake of emerging PFAS com- 
pounds are limited, but studies examining the 
concentration of chloroether sulfonic acids 
(F-53B, a replacement for PFOS in electro- 
plating industry) suggest similar variation with 
chain length (03). 

In contrast to the VAF patterns, which are 
largely governed by relative PFAS aqueous- 
sorbed partitioning, soil macroinvertebrates 
feeding directly on long-chain-rich vegetative 
detritus and NOM tend to express trends op- 
posite to that for VAFs. For example, macro- 
invertebrate accumulation factors (MAF = 
[PFAS ]macroinvertebrate/ [P FAS] oi) reported for 
earthworms (Eisenia andre?) in biosolid- 
amended soil have trends of increasing MAF 
with fluoroalkyl number (Fig. 4C) (104). 

After percolating through the vadose zone, 
relative PFAS mobility patterns have been re- 
ported in groundwater plumes. For example, 
PFAS concentrations were reported for wells 
in a groundwater plume flowing from a land- 
fill, to an observation well, and then to water- 
supply well (705). Given travel times exceeding 
24 years for flow from the landfill to the water- 
supply well, several PFCA homologs fell to un- 
detectable levels, but perfluorobutanoic acid, 
perfluorohexanoic acid, and PFOA exhibited a 
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pattern of lower downgradient/upgradient 
ratios (specifically, downgradient well 1/ 
upgradient well OW1f03) with increasing 
PFCA chain length (Fig. 4D). 

In ariverine setting, sediments downstream 
of a carpet industry have been reported to 
retain higher ratios of long-chain homologs 
than short (downstream site 5/upstream source 
site 4; Fig. 4E) (106), consistent with preferen- 
tial sorption of the longer homologs (perhaps 
affected by precursor transformation as well). 
In turn, this pattern also is expressed at the 
base aquatic autotrophic level; for example, 
aquatic vegetative-leaf accumulation (AVAF = 
[PFAS ]vegetation/ [PFAS ]water3 Fig. 4F) was rela- 
tively higher for long-chain compounds (J07). 
Mirroring these AVAF trends, aquatic macro- 
invertebrate accumulation factors (AMAF = 
[PFAS ] macroinvertebrate/ [P FAS ]eeaimenti Fig. 4F) 
for blackworms (Lumbriculus variegatus) in- 
creases with fluoroalkyl number as well (107). 


Environmental exposure 


Widespread global persistence of PFAS has 
resulted in detectable concentrations of the 
compounds in the blood of almost the entire 
human population (6). Human health effects 
from exposure to PFAS have been studied 
extensively, identifying possible carcinogenic, 
reproductive, endocrine, neurotoxic, dyslipide- 
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mic, and immunotoxic effects (6, 108, 109). 
However, with animal models reflecting sim- 
ilar postulated mechanisms of action, the po- 
tential toxicity of these compounds for wildlife 
cannot be dismissed (710). For humans, direct 
exposure through manufactured products can 
be managed more expediently than indirect 
exposure to accumulated sources in aquatic 
ecosystems. PFAS exposures through food 
chains are more difficult to resolve, and diet- 
ary exposure through drinking water and con- 
taminated food sources (e.g., seafood and 
other animal products) are among the greatest 
exposure sources for ecosystems and human 
populations alike (109, 111). Here, we review 
the consequences of PFAS persistence in the 
environment and the resulting bioaccumula- 
tion in biota, present ecotoxicological details 
in the context of environmental distribution 
and exposure potential, and discuss the ecolo- 
gical effects of PFAS mixtures (172). 
Estimation of environmental exposure to 
PFAS is hindered by the sheer number of 
functionally diverse PFAS and is further 
complicated by their presence as complex 
mixtures. A fundamental understanding of 
ecotoxicology requires comprehensive knowl- 
edge of all PFAS species to which target orga- 
nisms have been exposed. Although pragmatic 
limitations have fostered studies reporting 
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summary characterizations such as Total Or- 
ganic Fluorine and Total Oxidizable Precursor 
assays as proxies for more informative chemical- 
specific studies (173-116), more exhaustive ap- 
proaches providing identification of individual 
compounds within PFAS mixtures remains 
the more informative strategy (117, 118). Ide- 
ally, such characterizations would include de- 
tails regarding branched- versus linear-chain 
homologs, homolog ratios, isomer comparisons, 
and forensics with high-resolution mass spec- 
trometry. In addition to pinpointing potential 
point sources, these methods can distinguish 
between receptor contact with precursor com- 
pounds and their terminal products. 

An accurate assessment of PFAS risk must 
consider exposure to precursor compounds 
because these compounds transform and are 
thus important for characterizing environ- 
mental PFAS mixtures (119, 120). PFAS pre- 
cursors are susceptible to in vivo metabolic 
conversion to terminal acids or sulfonamides 
after exposure, as well as transformation 
during (or subsequent to) atmospheric or 
oceanic transport (see previous sections). For 
example, whereas PFSAs were the most abun- 
dant PFAS in both sediment and water at 
sites contaminated with AFFF (714), aquatic 
invertebrates exposed to AFFF displayed ele- 
vated concentrations of PFCAs as well as the 
6:2 fluorotelomer sulfonate (114, 115). Given the 
common detection of precursors, environmental- 
organismal uptake and distribution models 
should include both parent and degradant 
PFAS to best describe patterns of exposure 
and influence on biomagnification, especially 
considering the rapidly expanding incorpora- 
tion of new, shorter-chain PFAS that tend to 
be detected less frequently in biota (727). 

Key to understanding distribution of PFAS 
in biota are the specific interactions between 
PFAS and biological molecules. Although the 
bioaccumulation of some persistent organic 
pollutants is often related to lipid partition co- 
efficients, PFAS are not exclusively associated 
with lipids (120). Bioaccumulation modeling 
suggests that both protein interactions and 
lipid partitioning are important parameters 
for accurately assessing PFAS (122, 123), al- 
though predicting biomacromolecule inter- 
actions has proven difficult because of their 
physiochemical properties. PFAS do not be- 
have like neutral, hydrophobic organic con- 
taminants and instead are hypothesized to 
involve both phospholipids and proteinaceous 
tissues due in part to their anionic nature 
(123). Cooperative binding models have fur- 
ther correlated (and predicted) protein asso- 
ciations, relying on traditional measures of 
hydrophobicity and its effect on biomacro- 
molecule interactions (124). Therefore, both 
membrane-water partitioning and protein- 
water coefficients could be informative bio- 
accumulation indicators (i.e., bioconcentration 


Evich et al., Science 375, eabg9065 (2022) 


factors, bioaccumulation factors, and trophic 
magnification factors), and coupled with hepatic- 
and renal-clearance mechanisms across taxa 
are all vital in understanding PFAS persistence 
in organisms. Nevertheless, the specific physio- 
chemical differences, such as chain length, 
result in different distribution of PFAS in 
biological tissues (125). 

Ecotoxicological study of PFAS is further 
complicated by diversity of the PFAS class. Bio- 
accumulation factors for terrestrial vegetation 
are greater for PFCAs than for PFSAs, with 
shorter-chain perfluoroalkyl acids bioaccumu- 
lating to a greater degree than longer-chain 
ones, largely driven by variation in PFAS solu- 
bility (126), followed by uptake and transloca- 
tion into tissues (Fig. 4C) (100, 107). Conversely, 
potential perfluoroalkyl acid bioaccumula- 
tion in other fauna is greatest in long-chain 
compounds (120), with clear trends of bio- 
accumulation increasing with chain length 
(Figs. 4, C and F, and 5) (721). Long-chain 
PFAS concentrations tend to increase with 
trophic level in aquatic food webs, consistent 
with biomagnification processes (127). How- 
ever, transformation of precursors in exposure 
media and biota can confound interpretation 
of high concentrations of some PFAS (e.g., 
PFOS) as biomagnification without explicit 
identification of trophic magnification (128). 

Biomagnification in predators is related to 
trophic level, food-chain length, and capacity 
to metabolize PFAS precursors (125). Seabirds, 
marine mammals, and terrestrial species show 
the greatest magnification factors compared 
with exclusively aquatic food webs, in which 
organisms with gills eliminate perfluoroalkyl 
acids more efficiently (120). Effects in preda- 
tors, also frequently seen in humans, seem to 
be largely cytotoxic, immunological, reproduc- 
tive, or carcinogenic (125). Exposure models 
for aquatic food webs at AFFF-contaminated 
sites found benthic invertebrate consumers to 
be the avian dietary guild at highest exposure 
risk (114). At higher trophic levels, PFSAs (e.g., 
PFOS) bioaccumulate at greater rates than 
PFCAs (e.g., PFOA) of the same chain length 
(Fig. 5) (114, 129) and tend to be more toxic (4). 

Estuarine, marine, and freshwater environ- 
ments have demonstrated trophic magnifica- 
tion of long-chain PFAS (Fig. 5) (130, 131). 
Discrepancies in the relative concentrations 
of PFAS in fish compared with benthic in- 
vertebrates appear largely dependent on the 
compounds’ functional group and exposure 
routes, with elevated PFAS concentrations 
often linked to site-specific sources and/or 
benthic prey (131-133). Solubilized (i.e., water- 
borne) rather than dietary exposure was linked 
to reduced amphipod survival and reproduc- 
tion (133), but higher trophic-level organisms 
are exposed primarily through ingestion (109). 
Counterintuitively, exposure to low concentra- 
tions of PFAS can exacerbate bioconcentra- 
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tion, motivating biologically based, physiological 
models exploring this phenomenon (127). Over- 
all, evidence suggests that the ultimate global 
reservoirs of PFAS are oceans and marine 
sediments (134), emphasizing the importance 
of elucidating consequences of PFAS contam- 
ination in these ecosystems (135). 

Ecological implications of PFAS exposure 
to aquatic and terrestrial organisms high- 
light the need to assess and incorporate 
new-approach methodologies that prioritize 
real-world hazard of organismal exposure and 
subsequent risk. Mechanism-based studies and 
in silico approaches are beginning to fill data 
gaps pinpointing the cellular and molecular 
pathways resulting in toxicity (136, 137). Elim- 
ination half-life has been identified as an 
end point relevant to bioaccumulation and 
effects (4). In addition to prioritizing chem- 
ical selection based on environmental finger- 
printing, cross-taxa and sensitive-taxa toxicity 
testing research should focus on in silico model 
development that can determine tissue distrib- 
ution, molecular perturbations, and trophic- 
level accumulation. As the scale of assessment 
expands, so does the need for the continued 
development of adverse-outcome-pathway 
models to facilitate translation of exposure 
concentration/dose to organismal-effect end 
points for the projection of population-level 
consequences, including multigenerational 
effects. For instance, unexposed progeny of 
fish exposed to PFOA and PFOS had lower 
survival rates, reduced growth, and thyroid- 
related effects as revealed by histology (738). 
Similarly, lipid metabolism (139) and behav- 
ioral end points (140) were affected in sub- 
sequent generations of other species. 

Although data are available on potentially 
common mechanisms of action and toxicity 
between species (e.g., lipid metabolism, mod- 
ification of cell membrane integrity, protein 
binding, and nuclear receptor activation), the 
large number of PFAS underscores the need 
to augment conventional in vivo testing with 
in vitro and in silico approaches (4). Using 
these approaches, a number of moderate- and 
long-chain PFAS have been shown to elicit 
varying degrees of oxidative stress and modify 
the antioxidant defense systems of inverte- 
brates, induce neurotoxic and reprotoxic ef- 
fects across species, and reside in organisms 
longer than or comparable to any known class 
of anthropogenic contaminants (120). PFAS 
toxicity, bioaccumulation, and persistence gen- 
erally are increasingly problematic with in- 
creasing chain length. 


Remediation 


Treatment and remediation of PFAS-affected 
media is especially challenging because the 
chemistry of PFAS renders them unaffected 
by most traditional treatment technologies 
(141). Given the strength of the carbon-fluorine 
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bond, complete mineralization is difficult, 
with fluorinated products of incomplete de- 
struction remaining a concern (142, 143). Many 
existing treatment technologies are only capa- 
ble of concentrating PFAS (144), and concen- 
trated treatment residuals can result in the 
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(145). Therefore, treatment and remediation 
approaches for contaminated media should be 
considered in terms of a total management 
approach influenced by the primary source(s), 
the affected media, and the ultimate method 


need for a preventative and holistic approach 
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of destruction or long-term storage of PFAS. 
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breaking the treatment cycle. 


PFAS-affected drinking water often is the 
primary route of human exposure (7/46), and 
treatment techniques for aqueous media are 
the most well established, although perform- 
ance and cost for the removal of some short- 
chain PFAS can be particularly challenging. 
Management can occur at primary sources 
(i.e., treatment of industrial wastewater efflu- 
ent), at the secondary concentration source 
(e.g., drinking water treatment plants or land- 
fill leachate), or in diffuse environmental 
media (e.g., groundwater). Treatment of dif- 
fuse media can involve ex situ “pump-and- 
treat” approaches to adjoin groundwater to 
aqueous treatment technologies. The most 
established treatments for water are sorption 
to granular activated carbon (GAC) or ion- 
exchange stationary phases (147). Powdered sor- 
bents can be used; however, particle-separation 
technology is needed to physically recover the 
spent sorbent (e.g., conventional treatment, 
microfiltration, or ultrafiltration). 

Removal performance of sorbents differs 
among targeted PFAS, concentrations, back- 
ground water quality, and sorbent properties 
among other parameters (141, 147, 148). Anoth- 
er concentrative approach is the use of high- 
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pressure membrane systems such as reverse 
osmosis or nanofiltration. The residual stream 
for sorbent technologies are the spent media 
or a regenerate stream for regenerable ion- 
exchange media, whereas high-pressure mem- 
branes yield an enriched retentate. Both 
residual streams need to be processed further 
(Fig. 6). GAC typically is reactivated and single- 
use resins typically are incinerated, but little 
is known regarding PFAS fate in full-scale 
facilities. Likewise, studies evaluating treat- 
ment options for PFAS-laden reverse-osmosis 
membrane concentrate or ion-exchange rege- 
nerant are in their infancy (/49). Other, less- 
used techniques include membrane distillation, 
electrodialysis reversal, flotation, electrocoagu- 
lation, and evaporation. The niche applica- 
tions of these technologies are because of 
their performance, cost, and lack of process 
familiarity. 

Environmental media such as soils can 
be diffusely contaminated through wet/dry 
deposition; land application of PFAS-enriched 
materials such as biosolids, wastewater, or 
leachate; usage of PFAS-containing products 
such as AFFFs and pesticides or uncontrolled 
release through unlined landfills or spills. Soil 
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contamination is a threat to nearby water 
sources because of downward and lateral 
migration of PFAS into receiving water bodies 
(Fig. 4). In some cases, the large volume of soil 
that is affected makes ex situ removal and 
destruction a considerable logistics problem. 
Another approach to site management is in 
situ modification to enhance mobility of 
PFAS for pump-and-treat application or to 
stabilize PFAS migration using GAC or other 
sorbents (e.g., clays) to limit impacts (50). 
Although this can be an effective short-term 
site-management technique, it is not a per- 
manent solution, and likely will not retain all 
PFAS species effectively (148, 150, 157). In situ 
treatment of PFAS in aquifers requires different 
techniques, such as permeable reactive bar- 
riers or addition of powdered activated carbon - 
of which, none have shown the ability to 
control PFAS plumes in the long term (150). 
The terminal destination of PFAS wastes is 
of primary concern for the life cycle manage- 
ment of these compounds. Currently, two com- 
mercially viable long-term storage approaches 
are landfilling affected media or underground 
injection of contaminated water (145). Such 
sequestration is a temporary solution. Because 
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most PFAS do not naturally degrade to non- 
fluorinated chemical species, these long-term 
sinks are time-delayed sources. For example, 
landfills are recognized PFAS sources through 
PFAS-enriched landfill gas and liquid leachates 
(71). The only permanent solution to PFAS is 
the destructive remineralization of the under- 
lying fluorine, whether directly acting on con- 
taminated media or from treatment of residual 
streams of other treatment techniques, such as 
spent sorbents or regenerant solutions. 

Thermal treatment is a destructive approach 
that can achieve PFAS mineralization. Incin- 
eration by itself has been shown to at least par- 
tially destroy even highly fluorinated wastes 
(143), and advanced thermal oxidation can be 
used on solid, liquid, and gas samples to con- 
vert PFAS to constituent gases with an acid- 
scrubber cleanup (752). Ideally, this process 
yields HF, NO,, SO,, and CO, gases that are 
handled by traditional air pollution control 
technologies. However, thermal treatment re- 
quires substantial temperatures (>700°C) for 
a sufficient period to convert PFAS into HF 
and nonfluorinated products, with more highly 
fluorinated species requiring more time and 
higher temperature (153, 154). Catalytic oxida- 
tion at lower temperatures (e.g., 400°C) has 
been demonstrated for some PFAS (155). Ther- 
mal processes, however, have not been dem- 
onstrated at scale, where inefficiencies can 
reduce performance. Atmospheric emission 
of products of incomplete destruction or the 
air pollution control technologies associated 
with thermal treatment processes, including 
the regeneration of spent GAC, can become 
additional PFAS sources. Capture or destruc- 
tion of these products in the exhaust of ther- 
mal processes also is an area of active research, 
although forefront technologies are like those 
applied for other media, namely scrubbers, 
activated-carbon adsorption, and thermal 
oxidation. 

Other destructive treatments for aqueous 
streams include electrochemical degradation, 
sonolysis, nonthermal plasma, advanced oxi- 
dation (e.g., sulfate radicals) and reduction 
(solvated electrons), biodegradation (Feammox), 
zero-valent iron, hydrothermal, and supercrit- 
ical water oxidation (149, 156). Although many 
of these technologies have shown the ability to 
destroy select PFAS, none have demonstrated 
long-term performance approaching mineral- 
ization at full scale with natural and industrial 
water matrices for a wide assortment of PFAS. 
Also, the energy costs of many of these tech- 
nologies limit their sustainability and desira- 
bility, and the formation of harmful by-products 
(e.g., bromate, perchlorate) remains a concern 
(144). The lack of widespread testing and lim- 
ited field usage has led to a reluctance in using 
these technologies because additional manage- 
ment of the waste or residual streams will be 
needed. These unknowns, among others, fur- 
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ther demonstrate the need to minimize use of 
PFAS and find a total waste-management ap- 
proach in which complete destruction of PEAS 
is ensured. 


Conclusions 


The pool of new PFAS, for which physical, 
chemical, and toxicological data remain un- 
determined, is expanding rapidly and now 
includes untold numbers of compounds having 
widely varying chemical structures, volatilities, 
and solubilities, as well as uncertain potential 
exposure consequences. Early studies on struc- 
turally similar PFAS suggest that behavioral 
trends gleaned from legacy PFAS studies can 
be useful as a basis to predict fate, toxicity, and 
remediation strategies for emerging com- 
pounds. Recently, an internationally authored 
paper called for PFAS to be managed as a class 
based upon widespread use in commerce, shared 
inclusion of strongly bonded perfluorocarbon 
moiety, and the resulting environmental per- 
sistence of common terminal products (157). 
Current international reporting practices 
used to document PFAS synthesis, production 
volumes, and potential releases vary among 
countries and are not always tailored to pro- 
vide the knowledge necessary to adequately 
track and understand the movement of these 
compounds in the environment. These efforts 
typically serve as a critical first step in de- 
veloping knowledge to be used in future as- 
sessment and potential regulation of PFAS. 
In the United States, expansion of the Toxic 
Release Inventory will include ~172 long-chain 
PFAS starting in 2021, providing limited but 
valuable information in the form of sources, 
compositions, and quantities released for these 
compounds. However, under regulatory frame- 
works around the world, information on many 
PFAS is protected as confidential business 
information and will not be disclosed pub- 
licly (16), thereby necessitating substantial 
continued discovery and forensic identifica- 
tion efforts around the world. Other PFAS, 
such as many of those classified as chemical 
substances of unknown or variable compo- 
sition, by-products, or biological materials and 
polymers, may be too complex to fully charac- 
terize and can challenge scientific investigation. 
There is an ongoing need to advance re- 
sponsive PFAS science, particularly regard- 
ing investigating environmental sources and 
sinks, toxicity, and remediation technologies, 
but evidence suggests that preventative up- 
stream actions are critical to facilitating the 
transition to safer alternatives and minimizing 
the impact of PFAS on human health and the 
environment. Examples of these upstream ac- 
tions include the EPA’s Stewardship Program 
(158), the Amendment to the Polymer Exemp- 
tion Rule removing side-chain fluorotelomer 
polymers from the Exemption Rule (159), the 
Significant New Use Rule removing an ex- 
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emption for a set of PFAS used as coatings 
(160), the recently announced Comprehensive 
National Strategy to confront PFAS pollution 
(167), and a ban on PFAS in food contact paper 
in Denmark (/62). Regardless of the regula- 
tory approach implemented, collaborative ef- 
forts among scientists, industrial producers, 
and policy makers will remain key in finding 
effective and timely solutions (163). 
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INTRODUCTION: Human T cell responses to 
antigen stimulation, including the produc- 
tion of cytokines, are critical for healthy im- 
mune function and can be dysregulated in 
autoimmunity, immunodeficiencies, and can- 
cer. A systematic understanding of the regu- 
lators that orchestrate T cell activation with 
gain-of-function and loss-of-function gene per- 
turbations would offer additional insights into 
disease pathways and further opportunities to 
engineer next-generation immunotherapies. 


RATIONALE: Although CRISPR activation 
(CRISPRa) and CRISPR interference (CRISPRi) 
screens are powerful tools for gain-of-function 
and loss-of-function studies in immortalized 
cell lines, deploying them at scale in primary 
cell types has been challenging. Here, we de- 
veloped a CRISPRa and CRISPRi discovery 
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platform in primary human T cells and per- 
formed genome-wide screens for functional 
regulators of cytokine production in response 
to stimulation. 


RESULTS: We optimized lentiviral methods to 
enable efficient and scalable delivery of the 
CRISPRa machinery into primary human 
T cells. This platform allowed us to perform 
genome-wide pooled CRISPRa screens to dis- 
cover regulators of cytokine production. Pools 
of CRISPRa-perturbed cells were isolated by 
fluorescence-activated cell sorting into high 
and low bins based on levels of endogenous 
Interleukin-2 (IL-2) production in CD4"* T cells 
or interferon-y (IFN-y) production in CD8* 
T cells. Hits included proximal T cell recep- 
tor (TCR) signaling pathway genes, indicat- 
ing that overexpression of these components 
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Genome-wide CRISPRa/i screens discover tunable regulators of stimulation-responsive cytokine 
production in primary human T cells. Genome-wide CRISPRa/i gain-of-function and loss-of-function screens in 
human T cells allowed for systematic identification of regulators of cytokine production. Follow-up on key 
CRISPRa screen hits with secretome and scRNA-seq analysis helped to decode how these regulators tune 
T cell activation and program cells into different stimulation-responsive states. 
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could overcome signaling “bottlenecks” and 
tune stimulation and cytokine production. 

Reciprocal genome-wide loss-of-function 
screens with CRISPRi detected hits with crit- 
ical regulatory functions, including some 
missed by CRISPRa. By contrast, CRISPRa 
also identified hits that may not be required 
and in some cases were expressed at only low 
levels under the conditions of the screen. This 
was strongly exemplified by regulation of IFN-y 
production by the nuclear factor « B (NF-«B) 
signaling pathway, in which CRISPRi identified a 
required TCR-NF-«B signaling circuit (including 
MALTI and BCLIO0). CRISPRa selectively detected 
a set of tumor necrosis factor superfamily re- 
ceptors that also signal through NF-kB, including 
4-1BB, CD27, CD40, and OX40. These receptors 
were not individually required for signaling in 
our experimental conditions but could promote 
IFN-y when overexpressed. Thus, CRISPRa 
and CRISPRi complement each other for the 
comprehensive discovery of functional cyto- 
kine regulators. 

Arrayed CRISPRa perturbation validated the 
effects of key hits in CD4* and CD8* T cells. 
We also assessed how individual CRISPRa per- 
turbations more broadly reprogram cytokine 
production beyond IL-2 and IFN-y by measuring 
a panel of secreted cytokines and chemokines. 

Finally, we developed a platform for pooled 
CRISPRa perturbations coupled with single-cell 
RNA-sequencing (scRNA-seq) readout (CRISPRa 
Perturb-seq) in primary human T cells. We used 
CRISPRa Perturb-seq for deep molecular char- 
acterization of single-cell states caused by 
70 genome-wide screen hits and controls to 
reveal how regulators of cytokine production 
both tune T cell activation and program cells 
into different stimulation-responsive states. 


CONCLUSIONS: Our study demonstrates a ro- 
bust platform for large-scale pooled CRISPRa 
and CRISPRi in primary human T cells. Paired 
CRISPRa and CRISPRi screens enabled com- 
prehensive functional mapping of gene net- 
works that can modulate cytokine production. 
Follow-up of CRISPRa hits with arrayed phe- 
notypic analyses and with pooled scRNA-seq 
approaches enabled precise functional charac- 
terization of key screen hits, revealing how key 
perturbations may tune T cells to therapeuti- 
cally relevant states. Future CRISPRa and 
CRISPRi screens in primary cells could iden- 
tify targets for improved next-generation cellu- 
lar therapies. 
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Regulation of cytokine production in stimulated T cells can be disrupted in autoimmunity, 
immunodeficiencies, and cancer. Systematic discovery of stimulation-dependent cytokine regulators 
requires both loss-of-function and gain-of-function studies, which have been challenging in primary 
human cells. We now report genome-wide CRISPR activation (CRISPRa) and interference (CRISPRi) 
screens in primary human T cells to identify gene networks controlling interleukin-2 (IL-2) and 
interferon-y (IFN-y) production. Arrayed CRISPRa confirmed key hits and enabled multiplexed 
secretome characterization, revealing reshaped cytokine responses. Coupling CRISPRa screening with 
single-cell RNA sequencing enabled deep molecular characterization of screen hits, revealing how 
perturbations tuned T cell activation and promoted cell states characterized by distinct cytokine 
expression profiles. These screens reveal genes that reprogram critical immune cell functions, which 


could inform the design of immunotherapies. 


egulated T cell cytokine production in 
response to stimulation is critical for 
balanced immune responses. Cytokine 
dysregulation can lead to autoimmunity, 
immunodeficiency, and immune evasion 
in cancer (J-4). Interleukin-2 (IL-2), which is 
secreted predominantly by CD4"* T cells, drives 
T cell expansion (5) and is therapeutically 
applied in autoimmunity and cancer at differ- 
ent doses (6). Interferon-y (IFN-y) is a cytokine 
secreted by both CD4* and CD8* T cells that 
promotes a type I immune response against 
intracellular pathogens, including viruses (4), 
and is correlated with positive cancer immuno- 
therapy responses (7-9). Much of our current 
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understanding of the pathways leading to 
cytokine production in humans originates 
from studies in transformed T cell lines, which 
often are not representative of primary human 
cell biology (0-12). Comprehensive under- 
standing of pathways that control cytokine 
production in primary human T cells would 
facilitate the development of next-generation 
immunotherapies. 

Unbiased forward genetic approaches can 
uncover the components of regulatory net- 
works systematically, but challenges with 
efficient Cas9 delivery have limited their 
application in primary cells. Genome-wide 
CRISPR knockout screens have been completed 
using primary mouse immune cells from Cas9- 
expressing transgenic mice (13-15), including 
a screen for regulators of innate cytokine pro- 
duction in dendritic cells (13). Genome-scale 
CRISPR studies in human primary cells have 
recently been accomplished using transient 
Cas9 electroporation to introduce gene knock- 
outs (16, 17). However, comprehensive discovery 
of regulators requires both gain-of-function 
and loss-of-function studies. For example, 
CRISPR activation (CRISPRa) gain-of-function 
screens can discover genes that may not nor- 
mally be active in the tested conditions but 
can promote phenotypes of interest (18, 79). In 
contrast to a CRISPR knockout, CRISPRa or 
CRISPR interference (CRISPRi) require the 
sustained expression of an endonuclease-dead 
Cas9 (dCas9) and, because of poor lentiviral 
delivery, has been limited to small-scale expe- 
riments in primary cells (20, 21). Here, we 
developed a CRISPRa and CRISPRi screening 
platform in primary human T cells, which 
allowed for the systematic discovery of genes 
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and pathways that can be perturbed to tune 
stimulation-dependent cytokine responses. 


Genome-wide CRISPRa screens identify regulators 
of IL-2 and IFN-y production in T cells 


To enable scalable CRISPRa in primary human 
T cells, we developed an optimized high-titer 
lentiviral production protocol with a minimal 
dCas9-VP64 vector (pZR112), allowing for trans- 
duction efficiencies up to 80% (fig. S1). A 
second-generation CRISPRa synergistic ac- 
tivation mediator (SAM) system (22, 23) in- 
duced robust increases in target expression 
of established surface markers (fig. S2). Next, 
we scaled up our platform to perform pooled 
genome-wide CRISPRa screens targeting 
>18,800 protein-coding genes with >112,000 
single-guide RNAs (sgRNAs) (22). We used 
fluorescence-activated cell sorting (FACS) 
to separate IL-2-producing CD4" T cells and 
IFN-y-producing CD8* T cells into high and 
low bins (Fig. 1A and fig. S3A to D). Subse- 
quent sgRNA quantification confirmed that 
sgRNAs targeting IL-2 (772) and IFN-y (FENG) 
were strongly enriched in the respective cyto- 
kine high populations, and nontargeting con- 
trol sgRNAs were not enriched in either bin 
(Fig. 1B). Both CRISPRa screens were highly 
reproducible in two different human blood 
donors (Fig. 1, C and D, and fig. $3, E and 
F). Gene-level statistical analysis of the IL-2 
and IFN-y CRISPRa screens revealed 444 and 
471 hits, respectively, including 171 shared 
hits (Fig. 1E; fig. $3, Gand H; and tables S1 
and S2). Thus, CRISPRa screens provide a 
robust platform to discover gain-of-function 
regulators of stimulation-dependent responses 
in primary cells. 

CRISPRa hits included components of the 
T cell receptor (TCR) signaling pathway and 
T cell transcription factors. Activation of TBX21 
(encoding T-bet), which promotes both mem- 
ory CD8* T cell and CD4* T helper cell 1 (Ty1) 
differentiation (24-26), selectively enhanced 
the signature type I cytokine IFN-y (Fig. 1E). 
By contrast, sgRNAs activating GATA3, which 
promotes type II differentiation by antagonizing 
T-bet (25, 27), had the opposite effects (Fig. 1E). 
Overexpression of members of the proximal 
TCR signaling complex, such as VAV1, CD28, 
LCP2 (encoding SLP-76), and LAT (28, 29) 
reinforced T cell activation and were enriched 
in both cytokine-high bins. Conversely, the 
negative TCR signaling regulators MAP4K1 
and SLA2 were depleted in these bins (Fig. 1, 
B and E) (30, 31). Thus, CRISPRa identifies 
critical “bottlenecks” in signals leading to cyto- 
kine production. 


Complementary CRISPRa and CRISPRi screens 
comprehensively reveal circuits of cytokine 
production in T cells 


CRISPRa screens were effective in identifying 
limiting factors in cytokine production but 
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Fig. 1. Genome-wide CRISPRa screens for cytokine production in stimulated primary human T cells. 
(A) Schematic of CRISPRa screens. (B) sgRNA logs-fold changes for genes of interest in IL-2 (left) and 
IFN-y (right) screens. Bars represent the mean log2-fold change for each sgRNA across two human blood 
donors. Density plots above represent the distribution of all sgRNAs. (€ and D) Scatter plots of median 
sgRNA logs-fold change (high/low sorting bins) for each gene, comparing screens in two donors, for 

IL-2 (C) and IFN-y (D) screens. (E) Comparison of gene logs-fold change (median sgRNA, mean of two donors) 


in IL-2 and IFN-y screens. 


they could miss necessary components that 
would only be identified through loss-of- 
function studies. We therefore performed recip- 
rocal genome-wide CRISPRi screens, adapting 
our optimized lentiviral protocols (Fig. 2, A 
and B; fig. S4; and tables S1 and 2). Dropout 
of gold standard essential genes (32) and re- 
producibility across two human donors con- 
firmed the screen quality (fig. S5). The CRISPRi 
IL-2 and IFN-y screens identified 226 and 
203 gene hits, respectively, including 92 shared 
hits (Fig. 2, A and B). As expected, the CRISPRi 
hits were biased toward genes with high mRNA 
expression, including members of the CD3 com- 
plex, whereas CRISPRa additionally identified 
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regulators that were expressed either at low 
levels or not at all in T cells under the screened 
conditions (Fig. 2, C and D, and fig. S6). For 
example, PIK3API and JLIR1 were expressed 
at low levels under the screened conditions 
(fig. S7A). They are potentially inducible in 
some T cell contexts (fig. S7, B to D); how- 
ever, they were detected as hits by CRISPRa 
but not CRISPRi. 

The power of coupling activation and inter- 
ference screening was exemplified further by 
the identification of two IFN-y-regulating cir- 
cuits. CRISPRi screens identified key compo- 
nents of the nuclear factor « B (NF-«B) pathway 
that are required for IFN-y production (and, 
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fore not detected by CRISPRi (Fig. 2, E and F). 
Thus, CRISPRa and CRISPRi complement each 
other for the comprehensive discovery of func- 
tional cytokine regulators. 

To gain insights into functional pathways 
enriched across CRISPRi and CRISPRa screens, 
we completed gene set enrichment analysis 
(GSEA) of Kyoto Encyclopedia of Genes and 
Genomes (KEGG) pathways, identifying mul- 
tiple immune-related pathways as being en- 
riched across screens (fig. S8B). Furthermore, 
we analyzed data from numerous genome- 
wide association studies (GWAS) to determine 
whether the heritability of complex immune 
traits was enriched in genomic regions har- 
boring our screen hits by stratified linkage 
disequilibrium score regression (s-LDSC). Both 
CRISPRi and CRISPRa regulators of IFN-y and 
CRISPRa regulators of IL-2 were in regions 
enriched for immune trait heritability com- 
pared with nonimmune traits or an expression- 
matched background set (fig. S8C). Thus, these 
forward genetic screens may serve as a resource 
to help prioritize candidate functional genes in 
genomic regions associated with complex 
immune diseases. 

We next completed integrative analyses 
of gene hits across CRISPRa and CRISPRi 
screens for both cytokines. We found that a 
few genes were identified across all screens 
(e.g., ZAP70 as a positive regulator and CBLB 
as a negative regulator), representing core 
regulators of stimulation-responsive cytokine 
production in T cells. Most hits, however, were 
either cytokine-specific (IL-2 in CD4* T cells or 
IFN-y in CD8* T cells) or perturbation-specific 
(activation or interference) (fig. S8D). For a 
few target genes, including PTPRC (CD45), 
CRISRPa and CRISPRi both influenced cyto- 
kine production in the same direction, sug- 
gesting that for some genes, activation and 
interference both impair optimal levels (fig. 
S8E). The marked overlap in regulators be- 
tween IL-2 in CD4* T cells and IFN-y in CD8* 
T cells led us to perform additional genome- 
wide CRISPRa screens for IL-2, IFN-y, and 
TNF-a in CD4"* T cells, allowing for direct 
comparisons of type 1 cytokine regulators in 
CD4* T cells (fig. S9). Many of the strongest 
positive (e.g., VAVI, CD28, and LCP2) and 
negative hits (e.g., MAP4K1, LAT2, and GRAP) 
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Fig. 2. Integrated CRISPRa and CRISPRi screens mapping the genetic circuits 
underlying T cell cytokine response in high resolution. (A and B) Median 

sgRNA log>-fold change (high/low sorting bins) for each gene, comparing CRISPRi 
screens in two donors, for IL-2 (A) and IFN-y (B) screens. (C) Distributions of 
gene mRNA expression for CRISPRa and CRISPRi cytokine screen hits in resting 
CD4* T cells (this study). (D) Comparison of IL-2 CRISPRi and CRISPRa screens with 
genes belonging to the TCR signaling pathway (KEGG pathways) indicated in 
colors other than gray. (E) Comparison of IFN-y CRISPRi and CRISPRa screens with 
manually selected NF-«B pathway regulators labeled. All other genes are shown 

in gray. (F) Map of NF-«B pathway regulators labeled in (D). (G) Map of screen hits 
with previous evidence of defined function in T cell stimulation and costimulation 
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signal transduction pathways. Genes shown are significant hits in at least one screen 
and were selected based on review of the literature and pathway databases (e.g., 
KEGG and Reactome). Tiles represent proteins encoded by indicated genes with the 
caveat that, because of space constraints, subcellular localization is inaccurate 
because many of the components shown in the cytoplasm occur at the plasma 
membrane. Tiles are colored according to log>-fold change Z score, as shown 

in the subpanel, with examples of different hits. Large arrows at the top represent 
stimulation/costimulation sources. (H) Select screen hits with less well-described 
functions in T cells in the same format as (G). For (H), only significant hits from 
the top 20 positive and negative ranked genes by log>-fold change for each screen 
were candidates for inclusion. 
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overlapped across all CRISPRa screens, likely 
representing core regulators of type 1 cytokine 
production in response to stimulation and 
costimulation. Additionally, these screens 
identified hits that could potentially increase 
or decrease individual cytokines selectively. 
Thus, CRISPRi and CRISPRa hits reveal both 
core and context-specific regulators of cyto- 
kine production. 

We used our integrated dataset combined 
with literature review to build a high-resolution 
map of tunable regulators of signal transduc- 
tion pathways leading to cytokine produc- 
tion (Fig. 2G). This included calcium pathway 
signaling genes (e.g., PLCGI, PLCG2, PRKCB, 
PRKD2, and NFATC2), and cytokine signaling 
genes (e.g., STAT3, JAKI, JAK3, and SOCS3), 
the latter suggesting feedback circuits among 
cytokine signals. In particular, CRISPRa iden- 
tified regulators absent from previous litera- 
ture (e.g., APOBEC3A/D/C, FOXQ1, and EMP1) 
(Fig. 2H), underscoring the need for gain-of- 
function screens for comprehensive discov- 
ery. Thus, CRISPRa and CRISPRi screens 
complement one another to map the tunable 
genetic circuits controlling T cell stimulation- 
responsive cytokine production. 


Arrayed characterization of selected CRISPRa 
screen hits 


We next performed arrayed CRISPRa experi- 
ments for deeper phenotypic characterization 
of screen hits (Fig. 3A). We selected 14 screen 
hits (from different screen categories) (Fig. 3B) 
including the established regulators VAV1 
and MAP4K1 and the positive controls JZ2 
and JFNG. Notably, we included genes with 
relatively low expression in T cells under our 
experimental conditions, FOXQI, ILIR1, LHX6, 
and PIK3AP!1 (fig. S7). First, we validated that 
selected sgRNAs increased the expression of 
target gene MRNA (fig. S10). Next, we assessed 
IL-2, IFN-y, and TNF-a by intracellular stain- 
ing in both CD4* and CD8* T cells. Thirteen of 
14 target genes caused significant (g¢ < 0.05) 
changes in the proportion of cells positive 
for the relevant cytokine(s), with at least one 
sgRNA (Fig. 3, C and D, and fig. S11). Further- 
more, we observed effects on both IL-2 and 
IFN-y double- and single-positive popula- 
tions (fig. $12, A to C). With the exception of 
TNFRSFPIA (and IL2 or IFNG), positive regulators 
did not cause spontaneous cytokine production 
without stimulation (Fig. 3D and fig. S11B). 
Although IL-2 was screened in CD4* T cells and 
IFN-y in CD8* T cells, CRISPRa sgRNA effects 
were highly correlated across both lineages 
(Fig. 3E). We also assessed T cell differentia- 
tion and observed that FOXQ/ and TNFRSFIA 
significantly decreased the percentage of 
CD62L_* cells, indicating a shift toward effector 
T cell states as a potential mechanism (fig. 
$12D). Thus, these studies validate the pooled 
CRISPRa screens and begin to characterize 
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cytokine production and cell differentiation 
states promoted by activation of key target 
genes. 

We next tested whether genes identified by 
CRISPRa could also regulate cytokines when 
overexpressed as cDNA transgenes, because 
continuous expression of CRISPRa would pre- 
sent challenges in cell therapies caused by Cas9 
immunogenicity (33) (fig. SIZA). cDNA trans- 
gene overexpression of CRISPRa hits affected 
cytokine production in T cells stimulated with 
antibodies or antigen-positive cancer cells (fig. 
S13, B to D). Thus, this strategy could poten- 
tially be used to implement CRISPRa discov- 
eries in engineered T cell therapies. 

We next assessed how individual CRISPRa 
perturbations reprogram cytokine production 
by measuring a broad panel of 48 secreted 
cytokines and chemokines, 32 of which were 
detected in control samples (fig. S14A and 
table S6). After confirming that the effects on 
IL-2, IFN-y, and TNF-a measurements were 
generally consistent with intracellular staining 
(Fig. 3F and fig. S14B), we performed principal 
component analysis and hierarchical clustering 
on all cytokines. We observed sgRNA catego- 
rical grouping consistent with that observed 
in the screens, with sgRNAs targeting genes 
identified as regulators of both cytokines, caus- 
ing broad increases or decreases in cytokine 
concentration (Fig. 3G and fig. S14C). There were 
distinct patterns in the classes of cytokines 
increased by different regulators (Fig. 3H). 
VAV1 and FOXQ1 (a transcription factor that 
has not been well characterized in T cells) led 
to preferential increases in type 1 signature 
cytokines and dampened type 2 cytokines. 
Unexpectedly, OTUD7B, a positive regulator 
of proximal TCR signaling (34), had a distinct 
effect and increased type 2 cytokines (fig. S14D). 
We next investigated whether modulations in 
the secretome correlated with transcription- 
al control of the corresponding genes. Taking 
FOXQI as an example, we performed bulk RNA 
sequencing (RNA-seq) on FOXQ/ and control 
sgRNA CD4* T cells and found that it corre- 
lated strongly with the secretome effects (fig. 
$15). Thus, the identified regulators may not 
only modulate TCR stimulation and signaling 
but also tune the T cell secretome toward spe- 
cific signatures. 


CRISPRa Perturb-seq characterizes the 
molecular phenotypes of cytokine regulators 


To assess the global molecular signatures 
resulting from each CRISPRa gene induction, 
we developed a platform to couple pooled 
CRISPRa perturbations with barcoded single- 
cell RNA-seq (SCRNA-seq) readouts (CRISPRa 
Perturb-seq) (Fig. 4A). Because similar CRISPRa 
Perturb-seq approaches have been powerful 
in cell lines and animal models (35-37), we 
incorporated a direct-capture sequence into 
the CRISPRa-SAM modified sgRNA scaffold to 
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enable compatibility with droplet-based scRNA- 
seq methods (fig. S16). 

We performed CRISPRa Perturb-seq charac- 
terization of regulators of stimulation responses 
in ~56,000 primary human T cells, targeting 
70 hits and controls from our genome-wide 
CRISPRa cytokine screens (Fig. 4, A and B, 
and fig. S17, A to C). First, we confirmed that 
sgRNAs led to significant increases in the ex- 
pression of their target genes (fig. S17D). Next, 
uniform manifold approximation and pro- 
jection (UMAP) dimensionality reduction re- 
vealed discrete separation of the resting and 
restimulated cells (fig. S17E) and showed rela- 
tively even distribution of cells from two 
donors (Fig. 4C and fig. S17F). Gene signatures 
allowed us to resolve most T cells as either 
CD4"* or CD8* (Fig. 4D and fig. S17, G and H). 
Thus, we generated a high-quality CRISPRa 
Perturb-seq dataset. 

Cytokine production can be tuned by rein- 
forced TCR signaling. To identify CRISPRa 
gene perturbations that tune the general 
strength of stimulation-responsive genes, 
we calculated a scRNA-seq “activation” score 
based on a gene signature that we derived 
by comparing resting and restimulated cells 
within the nontargeting control sgRNA group 
(fig. S18). Projecting activation scores on the 
stimulated cell UMAP revealed discrete regions 
of higher and lower activation scores among 
the restimulated cells (Fig. 4E). We next exam- 
ined activation scores across CRISPRa pertur- 
bations (Fig. 4F). Negative regulators except 
IKZF3 (encoding the transcription factor 
Aiolos) decreased activation scores, suggest- 
ing that they act to broadly dampen stimula- 
tion strength. By contrast, IKZF3 reduced IEFNG 
expression without reducing the overall acti- 
vation score (Fig. 4F and fig. S19A), indicative 
of a possible distinct mechanism of cytokine 
gene regulation. Many of the positive regu- 
lators significantly increased activation score, 
with VAV1 causing the strongest activation 
potentiation (Fig. 4F). Thus, many, but not 
all, hits act by tuning overall T cell activation 
to varying degrees. 

We next investigated how different pertur- 
bations affected the expression of cytokine 
and other effector genes in stimulated cells. 
We analyzed pseudobulk differential gene 
expression under restimulated conditions 
for each sgRNA target cell group compared 
with no-target control cells (fig. S19, A and B). 
IFNG was differentially expressed in 29 dif- 
ferent sgRNA targets, with only sgRNAs tar- 
geting negative regulators causing decreased 
expression. JL2, however, was barely detect- 
able by scRNA-seq (fig. S19C). Only JZ2 and 
VAVI1 sgRNAs caused its increased expression, 
consistent with our observations that VAVI 
activation caused the greatest level of IL-2 re- 
lease (Fig. 3H). Many of the negative regulators 
drove a stereotyped pattern of differential 
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Fig. 3. Characterization of CRISPRa screen hits by arrayed profiling. multiple-comparisons correction. Full data are provided in fig. S11B. The medium 


(A) Schematic of arrayed experiments. (B) Comparison of IL-2 (in CD4* T cells) and _ stimulation dose is shown for IL-2 and IFN-y, and low-dose stimulation is shown for 
IFN-y (in CD8* T cells) CRISPRa screens, with genes targeted by the arrayed sgRNA TNF-a. (E) Scatter plot comparison of logs-fold changes in the percentage of 


panel indicated, as well as their screen hit categorization. Paralogs of arrayed panel cytokine-positive cells for arrayed panel sgRNAs versus the mean of no-target control 
genes that were also highly ranked hits are additionally indicated. (C) Representative  sgRNAs in stimulated CD4* and CD8* cells using the same data from (D). (F) Secreted 
intracellular cytokine staining flow cytometry for indicated cytokines in control cytokine staining arrayed panel grouped by indicated gene categories, with sgRNAs 


(NO-TARGET_1 sgRNA) or VAVI (VAV1_1 sgRNA) CRISPRa T cells after 10 hours of targeting the /L2 and IFNG genes removed. Points represent a single gene and 
stimulation. (D) Intracellular cytokine staining of full arrayed sgRNA panel, showing donor measurement. *P < 0.05, **P < 0.01, ***P < 0.001, Mann-Whitney U test. 
the percentage of cells that gated positive for the indicated cytokines in CD4* or CD8* — (G) Principal component analysis of secreted cytokine measurements resulting 

T cells. Points represent the mean value of four donors, with and without stimulation. — from the indicated CRISPRa sgRNAs. (H) Heatmap of selected secreted cytokine 
Dashed vertical lines represent the mean no-target control sgRNA control value measurements grouped by indicated biological category. Values represent the 
with stimulation. *q < 0.05, **q < 0.01, Mann-Whitney U test, followed by q value median of four donors, followed by Z-score scaling for each cytokine. 
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Fig. 4. CRISPRa Perturb-seq captures diverse T cell states driven by 
genome-wide cytokine screen hits. (A) Schematic of CRISPRa Perturb-seq 
experiment. (B) Categorical breakdown of genes targeted by the sgRNA library 
comprising hits from our primary genome-wide CRISPRa cytokine screens as 
indicated. Genes with a summed log>-fold change less than zero across both 
screens (diagonal line) are categorized as negative regulators. (C) UMAP 
projection of post-quality control filtered restimulated T cells, colored by blood 
donor. (D) Distribution of CD4* and CD8* T cells across restimulated T cell 
UMAP projection. Each bin is colored by the average log2(CD4/CD8) transcript 
levels of cells in that bin. (E) Restimulated T cell UMAP colored by average cell 
activation score in each bin. (F) Boxplots of restimulated T cells’ activation 
scores grouped by sgRNA target genes. Dashed line represents the median 
activation score of no-target control cells. *P < 0.05, **P < 0.01, ***P < 0.001, 
Mann-Whitney U test with Bonferroni correction. (G) Restimulated T cell 
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Cluster 
UMAP with cells colored by cluster. (H) Heatmap of differentially expressed 
marker genes in each cluster. The top 50 statistically significant (FDR < 0.05) 
differentially up-regulated genes for each cluster are shown, with genes that are 
up-regulated in multiple clusters being given priority to the cluster with the 
higher log>-fold change for the given gene. To the right of the heatmap are 
(left to right), the top marker genes by logs-fold change in each clusters’ section, 
the top overrepresented sgRNAs in each cluster by odds ratio (full data are 
provided in fig. S20G), and the top differentially up-regulated cytokine genes in 
each cluster. Mean cell logo(CD4/CD8) cell transcript values in each cluster are 
shown on the far right. (I) Restimulated T cell UMAP with the expression of 
indicated genes shown. (J) Contour density plots of restimulated cells assigned 
to indicated sgRNA targets in UMAP space. The no-target control contour is 
shown in grayscale underneath. “Perturbed cells” represents all cells assigned a 
single sgRNA other than no-target control sgRNAs. 
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cytokine gene expression, whereas positive 
regulators generally promoted more diverse 
cytokine expression patterns than negative 
regulators (fig. SI9A). TBX21 (T-bet) modulated 
the expression of most detectable cytokine 
genes. Furthermore, unlike most perturbations, 
it altered cytokine expression independently 
of stimulation (fig. S19D). 

We next used clustering analysis to charac- 
terize CRISPRa-driven cell states in restimu- 
lated and resting T cells (Fig. 4G and fig. S20). 
For each cluster, we identified the top up- 
regulated gene expression markers and cyto- 
kine genes, contributions of CD4*/CD8* T cells, 
and overrepresented sgRNAs revealing a di- 
verse landscape of T cell states promoted by 
CRISPRa (Fig. 4, H to J, and fig. $20, D to G). 
Negative cytokine regulators (e.g., MAP4K1) 
were highly enriched in cluster 2, marked by 
LTB expression and low activation score. Only 
GATA3 promoted a T helper 2 (Th2) pheno- 
type (cluster 3), suggesting that altered Th 
differentiation was not a common mechanism 
among negative JFNG regulators. Thus, Perturb- 
seq reveals cell states promoted by the over- 
expression of different key regulators. 

We identified two JL2-expressing clusters, 
despite poor capture of the transcript, with 
both clusters consisting primarily of CD4* 
T cells. Cluster 13 had the higher /Z2 expres- 
sion of the two and was promoted by VAV/ and 
OTUD7B sgRNAs. VAVI sgRNAs were strongly 
enriched in both JFNG- and JL2-expressing clus- 
ters, suggesting that VAV7-mediated potentiation 
of T cell stimulation may drive differentiation 
toward multiple distinct cytokine-producing 
populations. 

We also identified two distinct clusters of 
cells expressing JFNG (clusters 1 and 12) 
and containing both CD4* and CD8* T cells. 
Cluster 1 was marked by high expression of 
CCL3 and CCL4 and was enriched for sgsRNAs 
with strong activation score potentiation 
such as VAV1, CD28, and FOXQI. By contrast, 
cluster 12 was enriched for sgsRNAs known to 
activate the NF-«B pathway, such as JLIRI, 
TRAF3IP2, TNFRSFIA, and TNFRSFIB. These 
observations suggest that potentiated stim- 
ulation/costimulation may drive T cells to 
an activated JFNG-expressing state distinct 
from more specific signaling through the 
NF-«B pathway. Activation of a subset of 
TNFRSF receptor genes (TNFRSFIA, TNFRSF1B, 
LTBR, and CD27) also promoted cell states 
(clusters 5 and 6) marked by the high ex- 
pression of cell cycle genes. LTBR and CD27 
sgRNAs were almost exclusively found in 
cells of this cluster, whereas TNFRSFIA/B 
sgRNAs appeared to push cells to both pro- 
liferative and JFNG-expressing states. Thus, 
CRISPRa Perturb-seq reveals how regulators of 
cytokine production both tune T cell activation 
and program cells into different stimulation- 
responsive states. 
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Discussion 

Paired CRISPRa and CRISPRi screens com- 
plement one another to decode the genetic 
programs regulating stimulation-responsive 
cytokine production in primary human T cells. 
CRISPRi identified required cytokine regula- 
tors, whereas CRISPRa uncovered key signal- 
ing bottlenecks in pathway function as well 
as regulators that are not necessarily active 
in ex vivo-cultured T cells. Future screens 
performed in various other experimental con- 
ditions will have the potential to identify addi- 
tional regulators of T cell states and functions. 

The technologies developed in this study 
will enable screening approaches in primary 
human T cells and potentially other primary 
cell types, such as screens for functional non- 
coding regions of the human genome (J8, 38, 39). 
Furthermore, this screening framework should 
be adaptable to other nonheritable editing 
applications of the CRISPR toolkit (40), con- 
tinuing to expand opportunities to investi- 
gate complex biological questions in primary 
cells, especially when CRISPR perturbations 
are coupled with single-cell analyses. 

Major efforts are underway to discover gene 
modifications that enhance the efficacy of 
adoptive T cell therapies. Although we do not 
expect all perturbations that lead to increased 
cytokine production to translate to enhanced 
in vivo antitumor efficacy, we are encouraged 
by the identification of genes in various stages 
of therapeutic development, including CD5 (41), 
TNFRSF9 (encoding 4-1BB), CD27, CD40, and 
TNFRSF4 (encoding OX40). Recent preclinical 
work (42) highlights c-JUN overexpression 
to limit T cell exhaustion and further enhance 
cell therapies. Thus, loss- and gain-of-function 
discovery platforms can guide efforts to engi- 
neer T cells for different clinical indications. 
Future CRISPRa and CRISPRi screens in hu- 
man T cells will continue to nominate targets 
for improved next-generation cellular therapies. 


Materials and Methods 
Isolation and culture of human T cells 


Human T cells were sourced from PBMC- 
enriched leukapheresis products (Leukopaks, 
STEMCELL Technologies, catalog no. 70500.2) 
from healthy donors, after institutional re- 
view board-approved informed written consent 
(STEMCELL Technologies). Bulk T cells were 
isolated from Leukopaks using EasySep mag- 
netic selection following the manufacturers’ 
recommended protocol (STEMCELL Technol- 
ogies, catalog no. 17951). Unless stated other- 
wise, bulk T cells were frozen in Bambanker 
Cell Freezing Medium at 5 x 10” cells/ml 
(Bulldog Bio, catalog no. BBO1) and kept at 
-80°C for short-term storage or in liquid 
nitrogen for long-term storage immediate- 
ly after isolation. Unless otherwise noted, 
thawed T cells were cultured in X-VIVO 15 
(Lonza Bioscience, catalog no. 04-418Q) sup- 
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plemented with 5% fetal calf serum (FCS), 
55 mM 2-mercaptoethanol, 4 mM N-acetyl 
L-cysteine, and 500 IU/ml of recombinant 
human IL-2 (Amerisource Bergen, catalog no. 
10101641). Primary T cells were activated 
using anti-human CD3/CD28 CTS Dynabeads 
(Fisher Scientific, catalog no. 40203D) ata 
1:1 cell:bead ratio at 10° cells/ml. 


Cell line maintenance 


Lenti-X HEK293T cells (Takara Bio, catalog 
no. 632180) were maintained in high-glucose 
Dulbecco’s modified Eagle’s medium with 
GlutaMAX (Fisher Scientific, catalog no. 
10566024), supplemented with 10% FCS, 
100 U/ml of penicillin/streptomycin (PenStrep; 
Fisher Scientific, catalog no. 15140122), 1 mM 
sodium pyruvate (Fisher Scientific, catalog 
no. 11360070), 1x minimal essential medium 
(MEM) nonessential amino acids (Fisher 
Scientific, catalog no. 11140050), and 10 mM 
HEPES solution (Sigma-Aldrich, catalog no. 
H0887-100ML). Cells were passaged every 
2 days using Tryple Express (Fisher Scientific, 
catalog no. 12604013) for dissociation and 
maintained at <60% confluency. 

NALM6 cells were engineered to express 
NY-ESO-1 peptide in an HLA-A0201 back- 
ground, recognizable with the 1G4 TCR by the 
Eyquem laboratory at University of California 
San Francisco (UCSF) and provided for TCR 
stimulation coculture experiments. For sim- 
plicity, these cells are referred to as NALM6. 
NALM6 cells were cultured in RPMI (Invitro- 
gen, catalog no. 21870092) supplemented with 
10% FCS, 100 U/ml PenStrep (Fisher Scientific, 
catalog no. 15140122), 1 mM sodium pyruvate 
(Fisher Scientific, catalog no. 11360070), and 
1X MEM nonessential amino acids (Fisher 
Scientific, catalog no. 11140050), 10 mM HEPES 
solution (Sigma-Aldrich, catalog no. HO887- 
100ML), and 2 mM L-glutamine (Lonza Bio- 
science, catalog no. 17-605E). 


Plasmids 


dCas9-VP64 originated from lentiSAMv2 
(Addgene, catalog no. 75112) and cloned into 
the lentiCRISPRv2-dCas9 backbone (Addgene, 
catalog no. 112233) with Gibson Assembly. The 
promoter was switched to SFFV and mCherry 
was introduced upstream of dCas9-VP64, sep- 
arated by a P2A sequence resulting in the 
pZR112 plasmid. The LTR-LTR range was min- 
imized to enhance lentiviral titer. For CRISPRi, 
BFP in pHR-SFFV-dCas9-BFP-KRAB (Addgene, 
catalog no. 46911) was switched to mCherry 
with Gibson Assembly, resulting in pZRO71. 

Single sgRNAs for arrayed experiments 
have been introduced by Golden Gate Clon- 
ing as described previously (22). Briefly, DNA 
oligomers with Golden Gate overhangs were 
annealed and subsequently cloned into the 
nondigested target plasmid using the Golden 
Gate Assembly Kit (BsmBI-v2, New England 
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Biolabs, catalog no. E1602L). sgRNAs have been 
cloned into pXPR_502 (Addgene, catalog no. 
96923) for CRISPRa and into CROPseq-Guide- 
Puro (43) (Addgene, catalog no. 86708) for 
CRISPRi. All single sgRNAs used in this study 
can be found in table $3. 

The genome-wide CRISPRa (Calabrese A, 
catalog no. 92379 and Calabrese B, catalog 
no. 92380) and CRISPRi libraries (Dolcetto A, 
catalog no. 92385 and Dolcetto B, catalog no. 
92386) (22) were obtained from Addgene. Forty 
nanograms of each library were transformed 
into Endura ElectroCompetent Cells (Lucigen, 
catalog no. 60242-2) following the manu- 
facturer’s instructions. After transformation, 
Endura cells were grown in a shaking incuba- 
tor for 16 hours at 30°C in the presence of 
ampicillin. Library plasmid has been isolated 
using the Plasmid Plus MaxiKit (Qiagen, cat- 
alog no. 12963) and sequenced for sgRNA rep- 
resentation as described under the section titled 
“Genome-wide CRISPRa and CRISPRi screens.” 

For cDNA-mediated target overexpres- 
sion, the lentiCRISPRv2 (Addgene, catalog 
no. 75112) backbone was rebuilt to a lentiviral 
cDNA cloning plasmid with an SFFV promoter 
followed by BsmBI restriction sites and P2A- 
Puro. Transgene cDNAs were purchased from 
Genscript, choosing the canonical (longest) 
isoform for each gene, and BsmBI restriction 
sites were introduced by polymerase chain 
reaction (PCR). The final lentiviral transfer 
plasmids were assembled using the Golden 
Gate Assembly Kit (BsmBI-v2, New England 
Biolabs, catalog no. E1602L). 

To clone direct-capture compatible CRISPRa- 
SAM plasmids for Perturb-seq, different sgsRNA 
designs were synthesized as G-Blocks (Inte- 
grated DNA Technologies) and cloned into 
pXPR_502 (Addgene, catalog no. 96923) by 
Gibson assembly, replacing its sgRNA cassette. 


Lentivirus production 


Unless otherwise stated, human embryonic 
kidney (HEK) 293T cells were seeded in Opti- 
MEM I Reduced Serum Medium (OPTI-MEM) 
with GlutaMAX Supplement (Invitrogen, cat- 
alog no. 31985088) supplemented with 5% 
FCS, 1 mM sodium pyruvate (Fisher Scientific), 
and 1x MEM nonessential amino acids (Fisher 
Scientific) (COPTI-MEM) at 3.6 x 10’ cells per 
T225 flask in 45 ml of medium overnight to 
achieve confluency between 85 and 95% at 
the time point of transfection. The following 
morning, HEK293Ts cells were transfected 
with second-generation lentiviral packag- 
ing plasmids and transfer plasmid using 
Lipofectamine 3000 transfection reagent 
(Fisher Scientific, catalog no. L3000075). 
Briefly, 165 ul of Lipofectamine 3000 reagent 
was added to 5 ml of room-temperature 
OPTI-MEM without supplements. Forty-two 
micrograms of Cas9 transfer plasmid, 30 ug 
of psPAX2 (Addgene 12260), 13 ug of pMD2.G 
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(Addgene 12259), and 145 ul of p3000 reagent 
were added to 5 ml of room-temperature OPTI- 
MEM without supplements and mixed by gen- 
tle inversion. The plasmid and Lipofectamine 
3000 mixtures were combined, mixed by gen- 
tle inversion, and incubated for 15 min at 
room temperature. After incubation, 20 ml of 
medium was removed from the T225 flask and 
the 10-ml transfection mixture was carefully 
added without detaching HEK293T cells. 
After 6 hours, the transfection medium was 
replaced with 45 ml of cCOPTI-MEM supple- 
mented with 1x ViralBoost (Alstem Bio, catalog 
no. VB100). Lentiviral supernatant was har- 
vested 24 hours after transfection (first harvest) 
and replaced with 45 ml of fresh cOPTI-MEM. 
A second harvest was performed 48 hours 
after transfection. Immediately after collec- 
tion, the medium was centrifuged at 500g for 
5 min at 4°C to clear cellular debris. Unless 
otherwise noted, Lenti-X-Concentrator (Takara 
Bio, catalog no. 631232) was added to the col- 
lected supernatant, and lentivirus was concen- 
trated following the manufacturer’s instructions 
and resuspended in OPTI-MEM in 1% of the 
original culture volume without supplements. 
Lentiviral particles were subsequently aliquoted 
and frozen at —80°C. 


Flow cytometry 


Aria 2, Aria 3, and Aria Fusion cell sorters (BD 
Biosciences) at the UCSF Parnassus Flow Core 
and the Gladstone Institute Flow Core were 
used for sorting. The Attune NxT (Thermo 
Fisher Scientific) and LSRFortessa X-20 (BD 
Biosciences) flow cytometers were used for 
flow cytometry. Antibodies used for flow cyto- 
metric analyses and sorting are summarized 
in table S4. 


Intracellular cytokine staining 


Unless indicated otherwise, T cells were stim- 
ulated with ImmunoCult Human CD3/CD28/ 
CD2 T Cell Activator (STEMCELL Technologies, 
catalog no. 10990) with 6.25 ul/ml of culture 
medium at 2 x 10° cells/ml. One hour after 
restimulation, Golgi Plug protein transport 
inhibitor (BD Biosciences, catalog no. 555029) 
was added at a 1/1000 dilution. Nine hours 
after the addition of Golgi Plug, T cells were 
stained for surface antigens before fixation 
and subsequently processed for intracellu- 
lar cytokine staining using the BD Cytofix/ 
Cytoperm kit instructions (BD Biosciences, 
catalog no. 554714). 


Genome-wide CRISPRa and CRISPRi screens 


One day after activation, T cells from two hu- 
man blood donors were infected with 2% v/v 
concentrated dCas9-VP64 lentivirus. Two 
days after activation, T cells were split into 
two populations and infected with 1% v/v 
inultiplicity of infection (MOD ~ 0.5] Calabrese 
Set A (Addgene, catalog no. 92379) or 0.8% v/v 
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(MOI ~0.5) Calabrese Set B (Addgene, cata- 
log no. 92380) lentivirus. These two sets were 
independently cultured and processed in parallel 
until analysis. Three days after activation, 
fresh medium with IL-2 (final concentration 
500 IU/ml) and puromycin (final concen- 
tration 2 ug/ml) was added to bring cells to 
3 x 10° cells/ml. Cells were split 2 days later 
and fresh medium with IL-2 was added to 
bring cells to 3 x 10°cells/ml. Two days later, 
fresh medium without IL-2 was added to bring 
the concentration to 10°/ml. Eight days after 
initial activation, cells were harvested, cen- 
trifuged at 500g for 5 min, and resuspended 
at 2 x 10° cells/ml X-VIVO 15 without sup- 
plements. The following day, cells were restimu- 
lated and stained for FACS as described under 
the “Intracellular cytokine staining” section. 
Over the subsequent 2 days, cells were sorted 
at the Parnassus Flow Cytometry Core (PFCC) 
facility into IL-2" and IL-2"! CD4* and IFN-y° 
and IFN-y™ CD4° T cell populations (see fig. 
S3C for gating strategy). Sorted cells were 
stored in EasySep Buffer (phosphate-buffered 
saline with 2% FCS and 1 mM EDTA) over- 
night until genomic DNA isolation. 

The same experimental procedure using 
T cells from the same donors was followed 
for the CRISPRi screens. T cells were infected 
with dCas9-mCherry-KRAB at 2% v/v and 
Dolcetto A (Addgene, catalog no. 92385) and B 
(Addgene, catalog no. 92386) sgRNA libraries 
at 10% v/v or 25% v/v unconcentrated virus, re- 
spectively (~0.5 MOD. 

Genomic DNA was extracted from fixed cells 
as described previously (44). Integrated sgRNA 
sequences were amplified as described previ- 
ously (22), and sequencing libraries were subse- 
quently agarose gel purified using NucleoSpin 
Gel and PCR Clean-up Mini kit (Machery-Nagel, 
catalog no. 740609.50). Libraries were sequenced 
on a NextSeq500 instrument to a targeted depth 
of 100-fold coverage. 

For the supplementary CD4"* T cell set of 
genome-wide CRISPRa screens, CD4" T cells 
were isolated from Leukopaks using magnetic 
negative selection (STEMCELL Technologies, 
catalog no. 17952) and subsequently stimulated 
as described in the section entitled “Isolation 
and culture of human T cells.” T cells were then 
cultured and infected with lentivirus as de- 
scribed for the primary CRISPRa screens above. 
For library lentivirus production, Calabrese Set 
Aand Set B plasmid were mixed at equimolar 
ratios before transfection, and the pooled 
lentiviral particles from both sets was used 
for transduction. CD4 flow cytometry stain- 
ing on day 7 after T cell activation confirmed 
>98% purity. T cells were further processed 
and restimulated as described above. T cells 
were separately stained for IL-2, IFN-y, or 
TNF-a for FACS. After our initial analysis, it 
appeared that the IFN-y screen was potentially 
undersampled because of lower hit resolution 
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than the other screens. To address this, addi- 
tional fixed cells from the same experiment 
were stained and sorted as an additional techni- 
cal replicate and then computationally merged 
(described below). 


CRISPR screen analysis 


Reads were aligned to the appropriate refer- 
ence library using MAGeCK version 0.5.9.2 
(45) using the —trim-5 22,23,24,25,26,28,29,30 
argument to remove the staggered 5’ adapter. 
Next, raw read counts across both library sets 
were normalized to the total read count in each 
sample, and each of the matching samples 
across two sets were merged to generate a 
single normalized read count table. Normal- 
ized read counts in high versus low bins were 
compared using mageck test with —norm-method 
none, —paired, and —control-sgrna options, 
pairing samples by donor and using non- 
targeting sgRNAs as controls, respectively. 
Gene hits were classified as having a median 
absolute log,-fold change >0.5 and a false dis- 
covery rate (FDR) <0.05. For supplemental CD4* 
screens (fig. S9), reads were aligned to the full 
Calabrese A and B library in a single reference 
file. For the supplemental CD4* IFN-y screen, 
which was sorted and sequenced as two tech- 
nical replicates, normalized counts were aver- 
aged across technical replicates before analysis 
with mageck test. 


Gene-set enrichment analysis 


Gene-set enrichment analysis (GSEA) was com- 
pleted with the fgsea Bioconductor R package 
using the default settings (46). KEGG pathways 
version 7.4 were obtained from GSEA mSigDB 
http://www.gsea-msigdb.org/gsea/downloads. 
jsp. The KEGG NF-«B signaling pathway 
(entry hsa04064) was missing from this data- 
set and added manually from https://www. 
genome.jp/entry/pathway+hsa04064. 


Stratified linkage disequilibrium score analysis 


GWAS summary statistics were downloaded 
from the Price laboratory website (https:// 
alkesgroup.broadinstitute.org/sumstats_ 
formatted/ and https://alkesgroup.broad- 
institute.org/UKBB/). Linkage disequilibrium 
(LD) scores were created for each screen [cor- 
responding to a set of single-nucleotide poly- 
morphisms (SNPs) within 100 kb of genes 
identified as significant hits in each screen 
or their corresponding matched background 
sets] using the 1000G Phase 3 population ref- 
erence. Each annotation’s heritability enrich- 
ment for a given trait was computed by adding 
the annotation to the baselineLD model and 
regressing against trait chi-squared statistics 
using HapMap3 SNPs with the stratified LD 
score regression package (47). Heritability 
enrichments were then meta-analyzed across 
immune or nonimmune traits using inverse 
variance weighting. The sets of background 
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genes were sampled from the set of all genes 
that were expressed in the control sgRNA, 
stimulated bulk RNA-Seq data. For each screen, 
the background genes were sampled to match 
the significant screen hits in number and based 
on deciles of gene expression. Immune traits 
used for analysis were: “Eosinophil Count,” 
“Lymphocyte Count,” “Monocyte Count,” “White 
Count,” “Autoimmune Disease All,” “Allergy 
Eczema Diagnosed,” “Asthma Diagnosed,” 
“Celiac,” “Crohn’s Disease,” “Inflammatory 
Bowel Disease,” “Lupus,” “Multiple Sclerosis,” 
“Primary Biliary Cirrhosis,” “Rheumatoid Ar- 
thritis,” “Type 1 Diabetes,” “Ulcerative Colitis.” 
Nonimmune traits used were: “Heel Tscore,” 
“Baldingl,” “Balding4,” “Bmi,” “Height,” “Type 2 
Diabetes,” “Neuroticism,” “Anorexia,” “Autism,” 
“Bipolar Disorder,” “Depressive Symptoms,” 
“Fasting Glucose,” “Hdl,” “Ldl,” “Triglycerides,” 
and “Fasting Glucose.” 


Arrayed CRISPRa experiments 


For each gene chosen to target in follow-up 
experiments, one sgRNA was chosen from the 
Calabrese library used in screens. The first 
sgRNAs (“_1”) were manually chosen for con- 
sistent log,-fold change observed in both do- 
nors. The second sgRNA (“_2”) was picked 
from the hCRISPRa-v2 genome-wide library 
(48), choosing the top-ranked sgRNA not 
present in Calabrese libraries for each gene. 
sgRNAs were cloned into the pXPR_502 vector 
as described in the plasmid section. 

Primary human T cells were transduced 
with 2% v/v mCherry-2A-dCas9-VP64 lentivirus 
(pZR112) 1 day after activation. The following 
day (day 2), the dCas9-VP64-transduced cells 
were split into 96-well flat-bottom plates, 
avoiding edge wells, and transduced with 
a different sgRNA lentivirus in each well 
(5% v/v). One day after sgRNA transduction, 
fresh medium was added with IL-2 (500 IU/ml) 
and 2 ug/ml puromycin (final culture concen- 
trations). Cells were passaged 2 days later, 
adding fresh medium with 500 IU/ml of IL-2 
and maintaining a concentration of 3 x 10° 
to 1x 10° cells/ml, with 96-well plates copied 
as needed to maintain this concentration. On 
day 8, cells from copied plates were pooled 
and samples were counted. Cells were pel- 
leted and resuspended at a concentration of 
2 x 10° cells/ml in fresh X-VIVO-15 without 
additives. On day 9, cells were restimulated 
with anti-CD3/CD28/CD2 ImmunoCult T Cell 
Activator (as described in the “Intracellular 
cytokine staining” section) or left resting. 


RT-qPCR 


T cells were prepared as described under 
the “Arrayed CRISPRa experiments” section. 
Seven days after sgRNA transduction, 100,000 
T cells per well were pelleted at 500g for 5 min 
at 4°C. Cells were lysed and RNA was extracted 
using the Quick-RNA 96 kit (Zymo Research) 
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following the manufacturer’s protocol but skip- 
ping the option of in-well DNase treatment. 
DNase treatment and cDNA synthesis were 
subsequently completed with Maxima First 
Strand cDNA Synthesis Kit for reverse tran- 
scription quantitative PCR (RT-qPCR) with 
double-stranded DNase (Thermo Fisher Scien- 
tific). GPCR was performed with the PrimeTime 
PCR Master Mix (Integrated DNA Technologies) 
and PrimeTime qPCR probe assays (Integrated 
DNA Technologies; a list of probes used is 
provided in table S5) on an Applied Biosystems 
Quantstudio 5 real-time PCR system. Data were 
analyzed using the AACt method. The mean Ct 
values of two housekeeping genes, PPA and 
GUSB, to calculate the ACt, and the mean ACt 
of nontargeting controls to calculate AACt. 


cDNA experiments 


See fig. S13A for an experimental overview. 
One day after activation, T cells were trans- 
duced with the 1G4 TCR lentivirus recognizing 
the NY-ESO-1 antigen or nontransduced for 
immunocult assay. One day later, cells were 
transduced with the transgenes in cDNA 
format. Three days after initial activation, 
puromycin was added to obtain a final con- 
centration of 2 ug/ml, along with fresh X-VIVO 
15 medium with 500 IU/ml of IL-2, and fur- 
ther cultured and expanded analogous to the 
genome-wide CRISPR screens. Nine days after 
initial activation, T cells were centrifuged and 
resuspended at 2 x 10°cells/ml in X-Vivo 15 
without supplements. On the same day, 1G4 
TCR expression was assessed by flow cytometry 
after dextramer staining (Immudex, catalog no. 
WB3247-PE) to ensure even expression across 
different cDNA constructs. The following day, 
T cells were restimulated with either 6.25 ul/ml 
of Immunocult or NALM6 cells at an effector: 
target ratio of 1:2 for 1G4 TCR-transduced cells. 
Cells were further processed as described under 
the “Intracellular cytokine staining” section. 
CD22 was used as a marker for NALM6 cells to 
discriminate them from T cells in the coculture. 
Overexpression of OTUD7B cDNA together with 
the 1G4 TCR (but not alone) caused toxicity 
and was therefore excluded from analyses. Two 
donors were excluded from the 1G4 TCR assay 
because of poor TCR transduction. 


Cytokine Luminex assay 


T cells were prepared as explained under the 
“Arrayed CRISPRa experiments” section. 
On day 9 after activation, T cells at a con- 
centration of 2 x 10° cells/ml were restimu- 
lated with InmunoCult Human CD3/CD28/ 
CD2 (STEMCELL Technologies, catalog no. 
10970) at 6.25 l/ml. Twenty-four hours after 
restimulation, supernatant was collected and 
frozen at —20°C. After a serial pilot titration, 
cytokine analyses were performed at a 1/200 
dilution by Eve Technologies with the Luminex 
xMAP technology on the Luminex 200 system 
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(Luminex). To remove very lowly expressed 
cytokines for downstream analysis, any group 
in which three of four donors had undetect- 
able cytokines, the cytokine was removed. 
Additionally, the sgIL1R1-1 donor 4 measure- 
ment for IL-l was removed manually because 
this was an extremely high outlier. 


Bulk RNA-seq sample preparation 


FOXQI1 and nontargeting sgRNA control pri- 
mary human T cells from four donors were 
transduced and expanded as described in the 
“Arrayed CRISPRa experiments” section. 
On day 8, mCherry*CD4"* populations were 
sorted and resuspended in X-VIVO-15 without 
additives at 2 x 10° cells/ml. On day 9, cells 
were restimulated with 6.25 ul/ml of anti-CD3/ 
CD28/CD2 ImmunoCult or left unperturbed 
for resting (nonstimulated) condition. Twenty- 
four hours later, cells were lysed for RNA. 
RNA was purified using the Quick-RNA 
Microprep kit (Zymo Research) without the 
optional in-well DNase treatment step. Purified 
RNA was treated with TURBO DNase (Thermo 
Fisher Scientific) to remove potential contam- 
inating DNA. RNA was subsequently purified 
using the RNA Clean & Concentrator-5 kit 
(Zymo Research). RNA quality control was 
performed using an RNA ScreenTape assay 
(Agilent Technologies), with all samples having 
an RNA integrity number >7. RNA-seq libraries 
were prepared using the Illumina Stranded 
mRNA Prep kit with 100 ng of input RNA. 
Libraries were sequenced using paired-end 
72-bp reads on a NextSeq500 instrument to an 
average depth of 3.2 x 10” clusters per sample. 


Bulk RNA-seq data analysis 


Adapters were trimmed from fastq files using 
cutadapt version 2.10 (49) with default settings 
keeping a minimum read length of 20 bp. 
Reads were mapped to the human genome 
GRCh38 keeping only uniquely mapping 
reads using STAR version 2.7.5b (50) with 
the setting “-outFilterMultimapNmax 1.” Reads 
overlapping genes were then counted using 
featureCounts version 2.0.1 (57) with the setting 
“-s 2” and using the Gencode version 35 basic 
transcriptome annotation. 

The count matrix was imported into R. Only 
genes with at least 1 count per million across 
at least four samples were kept. TMM nor- 
malized counts were used for heatmaps. Dif- 
ferentially expressed genes between FOXQ1 
overexpression and control samples were 
then identified using limma version 3.44.3 
(52) while controlling for any differences 
between donors. Significant differentially ex- 
pressed genes were defined as having an FDR- 
adjusted P value <0.05. 


Perturb-seq library design and cloning 


The CRISPRa Perturb-seq target genes were 
selected from the primary IL-2 and IFN-y 
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CRISPRa screen results. First, genes that had a 
significant fitness defect were removed from 
the gene list (fig. S5). Next, genes were ranked 
by median sgRNA log,-fold change and the 
top ranked, not previously selected gene, was 
picked in the following order: (1) IL-2-positive 
hit, (2) IFN-y-positive hit, (3) IL-2-positive hit, 
(4) IFN-y-positive hit, and (5) IL-2- or IFN-y- 
negative hit (alternating each round), such that 
positive hits outnumbered negative hits at a 
4:1 ratio. Only hits that were significant (FDR 
< 0.05) were selected in each round. The one 
exception was TCF7, which was added manu- 
ally because we considered it worthwhile to 
analyze due to its known effects on T cell func- 
tion. To select sgRNAs, the top two enriched 
sgRNAs by log,-fold change in the screen for 
which the gene was selected were used. The 
library was ordered as pooled single-stranded 
oligos, PCR amplified, and cloned into the 
CRISPRa-SAM direct-capture design I cloning 
vector (pZR158). 


Perturb-seq sample preparation and sequencing 


Bulk CD3* primary human T cells from two 
donors were transduced and cultured as de- 
scribed in the “Genome-wide CRISPRa and 
CRISPRi screens” section, except library trans- 
duction was completed at lower MOI of 0.3. 
Cells in the stimulated condition were stimu- 
lated with 6.25 ul/ml of anti-CD3/CD28/CD2 
immunocult. Twenty-four hours later, cells 
from both the stimulated and nonstimulated 
condition were sorted for mCherry* (marking 
dCas9-VP64). Sorted cells were processed to 
single-cell RNA-seq and sgRNA sequencing 
libraries by the Institute for Human Genetics 
(IHG) Genomics Core using Chromium Next 
GEM Single Cell 3’ Reagent Kit version 3.1 
with feature barcoding technology for CRISPR 
screening, following the manufacturer’s proto- 
col. Before loading the Chromium chip, sorted 
cells from two blood donors were normalized 
to 1000 cells/ul and mixed at a 1:1 ratio for 
each condition. Twenty microliters of cell sus- 
pension was loaded into four replicate wells 
per condition, for a total 80,000 cells loaded 
per condition. Final sgRNA sequencing libra- 
ries were further purified for the correct size 
fragment by 4% agarose E-Gel EX Gels (Thermo 
Fisher Scientific) and gel extracted. Libraries 
were sequenced over two NovaSeq S4 lanes 
(two stimulated wells and two nonstimulated 
wells per lane) at a 2:1 molar ratio of the gene 
expression libraries to sgRNA libraries. 


Perturb-seq analysis 


Alignments and count aggregation of gene 
expression and sgRNA reads were completed 
with Cell Ranger version 6.1.1. Gene expression 
and sgRNA reads were aligned using cellranger 
count, with default settings. Gene expression 
reads were aligned to the “refdata-gex-GRCh38- 
2020-A” human transcriptome reference down- 
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loaded from 10x Genomics. sgRNA reads were 
aligned to the Perturb-seq library using the 
pattern (BC)GTTTAAGAGCTATG. Counts were 
aggregated with cellranger aggr with default 
arguments. To assign sgRNAs to cells, cellranger 
count output files “protospacer_calls_per_cell.csv” 
were used, filtering out droplets with >1 
sgRNA called, returning a median of 133 sgRNA 
UMIs in sgRNA singlets. For increased strin- 
gency, only droplets with =5 sgRNA UMIs 
were used in further analysis. 

Cell donors were genetically demultiplexed 
using Souporcell (53) (https://github.com/ 
wheaton5/souporcell). The input for each run 
was the bam file and barcodes.tsv file from 
the cellranger count output and the reference 
fasta. Donor calls across wells were harmon- 
ized using the vef file outputs from Souporcell 
using a publicly available python script (https:// 
github.com/hyunminkang/apigenome/blob/ 
master/scripts/vcf-match-sample-ids). 

Gene expression data were imported and 
analyzed in R with the Seurat version 4.0.3 
Readi0X function (54). Cells were initially 
quality filtered for percentage of mitochon- 
drial reads <25% and number of detected RNA 
features >400 and <6000, removing 4% of cells. 
After filtering, a median of 401 cells per sgRNA 
target gene per condition (median of 127 ssRNA 
unique molecular indices (UMIs) per singlet) 
were recovered, along with ~2000 cells with 
no-target control guides per condition. Four 
sgRNA targets, HELZ2, TCF7, PRDM1, and 
IRX4, were removed from downstream analy- 
sis because of low cell counts (<100). 

Gene-expression counts were normalized 
and transformed using the Seurat SCTransform 
function (55), with the following variables 
regressed: percentage mitochondrial reads, 
S-phase score, and G./M-phase score, perform- 
ing the regression as described on the Satija 
laboratory website (https://satijalab.org/seurat/ 
articles/cell_cycle_vignette.html). Normalized 
and transformed counts were used for all down- 
stream analysis. To call CD4* and CD8* T cells, 
a CD4/CD8 score for each cell using follow- 
ing formula was used: log,[CD4/mean(CD8A, 
CD8B)], with a score <-0.9 called as a CD8* cell 
and a score >1.4 called a CD4* cell (fig. S17G). 

For both restimulated and resting condi- 
tions, UMAP reduction was performed with 
dimensions 1 to 20, and otherwise default 
settings of the RunUMAP Seurat function. 
For clustering, FindClusters was run using 
algorithm 3, resolution 0.4 for the restimu- 
lated condition and resolution 0.5 for the 
resting condition. Two clusters in the restimu- 
lated condition were manually merged to form 
“Cluster 2: Negative Regulators.” The merged 
clusters showed highly similar gene expres- 
sion patterns, with one cluster containing the 
bulk of cells containing negative regulator 
sgRNAs and the other containing sgRNAs tar- 
geting the negative regulator MUCI. Cluster 
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trees shown were generated using the Seurat 
BuildClusterTree function with default argu- 
ments. For pseudobulk differential expression 
analyses, the Seurat FindMarkers function 
was used with the default method, Wilcoxon 
rank sum test. 

To generate the T cell activation score, pseu- 
dobulk differential expression analysis was 
first performed on restimulated versus rest- 
ing no-target control sgRNAs, and log,-fold 
change outputs were used as gene weights. 
Only genes that had an absolute log.-fold 
change >0.25 and were detected in 10% of 
restimulated or resting cells were used for 
gene weights. For a given cell, the activation 
score was calculated as sum(Gx x Gw/Gy), 
where Gx is a gene’s normalized/transformed 
expression count, Gy is the gene’s weight, and 
Gw is the gene’s mean expression in no-target 
control cells (to correct for differential levels of 
baseline expression). 


Statistical analysis 


All statistical analyses were performed in R 
version 4.0.2 unless otherwise noted. To ad- 
dress ties in nonparametric tests, Mann- 
Whitney Utests were performed using the 
wilcox_test function of the Coin R package 
(version 1.4-1), with default arguments. For 
q-value-based multiple-comparisons correction, 
the R qvalue package (version 2.20.0) was used, 
with default arguments. 
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INTRODUCTION: Although much is known about 
plant traits that function in nonhost resistance 
against pathogens, little is known about non- 
host resistance against herbivores, despite its 
agricultural importance, because of the lack 
of fieldwork. Empoasca leafhoppers, serious 
agricultural pests, identify host plants by eaves- 
dropping on unknown outputs of jasmonate 
(JA)-mediated signaling in a native tobacco 
plant that is naturally variable in its JA signal- 
ing. The known sectors of this tobacco plant’s 
specialized defense metabolism are not effec- 
tive against this insect, which calls for an un- 
biased approach. 


RATIONALE: An unbiased forward-genetics ap- 
proach based on the screening of a 26-parent 
recombinant inbred line population in a 


natural habitat with native herbivores was 
wedded with unbiased transcriptomic and 
mass spectrometry-based metabolomic analy- 
ses of reverse-genetics lines to identify defense 
chemistries produced by this native tobacco 
when probed by leafhoppers. Synthetic biology 
approaches were used to reconstitute these 
chemistries in crop plants. 


RESULTS: The analysis revealed an Empoasca- 
elicited JA-JAZi module that pointed to the 
phenolamide master transcription factor, 
MYB8, as a central genetic hub clustering with 
putrescine-derived phenolamides. Using to- 
bacco plants silenced for components of JA sig- 
naling JAZ and MYC2 genes) and phenolamide 
biosynthesis, the central role of a MYC2-MYB8- 
JAZi branch of JA signaling was confirmed; 


Polyamine pathway 


NH 
H)N7 ~~~ _? 


Opportunistic leafhopper attack elicits caffeoylputrescine—green leaf volatile defenses. Attack elicits a 
JAZi-mediated sector of JA signaling to condense the products of three branches of specialized metabolism 
(green leaf volatile, phenylpropanoid, and polyamine pathways) in a native tobacco plant through a PPO-catalyzed 
and BBL2-mediated Michael addition reaction to produce previously unobserved defense chemistry (CPH) that 
was reconstituted in crop plants for durable nonhost resistance. CP, caffeoylputrescine; AT1, acyltransferase 1. 
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however, infiltration of MYC2-silenced plants 
with known putrescine-derived phenolamides 
did not alter Empoasca preference. Subse- 
quent detailed structural analysis revealed an 
unknown metabolite whose abundance was 
regulated by the MYC2-MYB8-JAZi branch of 
JA signaling and was negatively correlated with 
Empoasca damage. Previous work on this un- 
known metabolite suggested a conjugate of 
caffeoylputrescine with a C-6 aldehyde produced 
during wound-induced lipid peroxidation—a 
process that leads to the formation of green 
leaf volatiles. Metabolite quantitative trait locus 
(mQTL) analysis and coexpression analysis 
pointed to two polyphenol oxidases (PPOs) 
and one berberine bridge enzyme-like 2 (BBL2) 
gene associated with the metabolite’s bio- 
synthesis. The function of the proteins encoded 
by these genes was tested in both in vitro 
[Escherichia coli expression and enzymatic as- 
says with (Z)-3-hexenal and caffeoylputrescine] 
and in vivo (transient expression in Solanum 
chilense and Vicia faba) systems. The structure 
of the unknown metabolite was identified by 
nuclear magnetic resonance (NMR) to bea 
caffeoylputrescine-green leaf volatile com- 
pound (CPH), catalyzed by a PPO in a Michael 
addition reaction and requiring BBL2 in planta. 
Synthetic biology approaches confirmed the 
function of CPH in nonhost resistance against 
Empoasca leafhoppers in Nicotiana attenuata 
lines silenced to be defective in CPH production; 
in V. faba, a bean crop host plant of the leaf- 
hoppers unable to produce caffeoylputrescine; 
and in S. chilense. 


CONCLUSION: The natural history-driven multi- 
omics framework used for the discovery of 
CPH and its marriage with synthetic biology 
approaches highlight how readily the results 
of millions of years of innovation by natural 
selection can be amortized and transferred 
to crop plants to catalyze a greener and eco- 
logically more nuanced revolution in plant 
protection. Crop plants face challenges not 
substantially different from those faced by 
native plants; they are constantly tested by 
hidden herbivore communities that challenge 
the host-nonhost distinction. In a world of 
climate change and globally homogenized 
herbivore communities, opportunistic associ- 
ations will dominate natural and man-made 
ecosystems. CPH represents a chemical inno- 
vation that allows a native plant to cope with 
these opportunistic associations and is readily 
engineered in crop plants. 
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Although much is known about plant traits that function in nonhost resistance against pathogens, little 
is known about nonhost resistance against herbivores, despite its agricultural importance. Empoasca 
leafhoppers, serious agricultural pests, identify host plants by eavesdropping on unknown outputs of 
jasmonate (JA)-mediated signaling. Forward- and reverse-genetics lines of a native tobacco plant 
were screened in native habitats with native herbivores using high-throughput genomic, transcriptomic, 
and metabolomic tools to reveal an Empoasca-elicited JA-JAZi module. This module induces an 
uncharacterized caffeoylputrescine-green leaf volatile compound, catalyzed by a polyphenol oxidase in 
a Michael addition reaction, which we reconstitute in vitro; engineer in crop plants, where it requires 

a berberine bridge enzyme-like 2 (BBL2) for its synthesis; and show that it confers resistance to 
leafhoppers. Natural history—guided forward genetics reveals a conserved nonhost resistance 


mechanism useful for crop protection. 


eing at the bottom of most terrestrial 
food chains, plants are continuously 
attacked by herbivores and pathogens 
(1, 2). Research into plant traits that 
provide resistance against these biotic 
agents has primarily focused on nonhost re- 
sistance to pathogens (3-5) and host resistance 
to herbivores (6, 7). This difference in empha- 
sis likely reflects the greater physiological au- 
tonomy of herbivores, which are selective in 
choosing plants to attack, coupled with the 
challenge of discovering resistance traits of 
hosts that herbivores refuse to attack. Plants 
rendered defenseless by the abrogation of de- 
fense pathways can be attacked by nonhost 
herbivores in nonchoice assays in the labora- 
tory (8, 9). However, these assays do not cap- 
ture the selective procedures by which insects 
choose their host plants in nature, which lim- 
its the inferences that can be drawn from these 
laboratory studies about nonhost resistance. 
Because of the paucity of field studies, the 
mechanisms and metabolic traits underlying 
nonhost resistance against herbivores remain 
largely unknown. 
We found that Nicotiana attenuata plants— 
transformed to silence the signaling that me- 
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diates inducible expression of host resistance 
traits—when released into the wild, are con- 
tinuously assessed and attacked by nonhost 
insect herbivores when rendered defenseless 
(10). Among these opportunistic herbivores 
is the Empoasca leafhopper—a major pest 
common to many crops. These leafhoppers 
probe nonhost plants to eavesdrop on a plant’s 
jasmonate (JA) signaling, which is elicited 
upon probing (7/1). However, N. attenwata’s 
portfolio of JA-elicited specialized metabo- 
lites, such as alkaloids, protease inhibitors, 
diterpene glycosides (17-HGL-DTGs), and elic- 
ited volatiles, which are effective against host 
herbivores, were excluded as nonhost resist- 
ance traits (71). Because many agricultural 
pests may be opportunistic herbivores, under- 
standing mechanisms of nonhost resistance 
against insects could accelerate the breed- 
ing of durable resistance in crops. To date, 
the most commonly used strategies to con- 
trol agricultural pests are insecticidal sprays 
and ectopic expression of insecticidal pro- 
teins, both of which have ecological draw- 
backs (72, 13). 

To uncover the JA-elicited nonhost resistance 
traits of N. attenuata, we adopted a forward- 
genetics strategy. We planted a replicated pop- 
ulation of 650 recombinant inbred lines (RILs) 
from a 26-parent multiparent advanced gen- 
eration intercross (MAGIC) population into 
a native habitat in Arizona, USA (Fig. 1A and 
fig. S1). In this setting, Empoasca leafhop- 
pers are abundant and damage their native 
host cucumbers (Cucurbita foetidissima). JA- 
deficient N. attenuata lines were attacked by 
the leafhoppers (7) at rates that varied within 
the MAGIC populations (Fig. 1A). We quan- 
tified Empoasca attack levels in 1907 indi- 
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vidual plants of 674 RILs and parental lines 
of the field-grown MAGIC population and 
constructed a multi-omics dataset based on 
high-throughput analyses of phytohormones, 
transcriptomes, and metabolomes. This multi- 
omics dataset was produced from leaves elic- 
ited by a simulated herbivory treatment. We 
mimicked herbivore attack by treating stand- 
ardized puncture wounds (W) with oral secre- 
tions (OS) of Manduca sexta larvae (W + OS) 
to remove confounding factors caused by the 
stochastic nature of insect attack in nature 
and to capture transiently expressed genes 
and metabolites in these samples from field- 
grown plants (Fig. 1B and fig. S2). 

To analyze associations among the genetic 
and metabolic responses of this multi-omics 
dataset, we first focused on JA signaling- 
related genes and used previously acquired 
knowledge of N. attenuata leaf chemistry 
(14, 15) to construct a coassociation network 
of the JA-dependent module. This network 
considered not only the correlations among 
metabolites, phytohormones, and gene expres- 
sions but also the shared single-nucleotide 
polymorphisms (SNPs) inferred from meta- 
bolic quantitative trait locus (mQTL) or expres- 
sion QTL (eQTL) analyses for each of these 
components (Fig. 1C, fig. $3, and data S1). 
This coassociation network revealed that 
JAs and JA-related genes nucleated by the JA- 
regulated phenolamide master transcription 
factor (TF) regulator NaMYB8 (/6) formed a ge- 
netic hub that clustered with induced phenola- 
mides, such as N-coumaroylputrescine (CoP), 
N-caffeoylputrescine (CP), N-feruloylputrescine 
(FP), and malonylated 17-HGL-DTGs (Fig. 1C). 
More peripheral to this hub were glycosylated 
17-HGL-DTG precursors, such as lyciumoside 
I, lyciumoside IV, attenoside, and nicotiano- 
side III, and other specialized metabolites, 
such as nicotine, acylsugars, and flavonoids 
(fig. S3). A NaJAZi gene clustered centrally 
to JAs but was distant from NaJAR4 and 
NaCOl!l, which suggests the engagement of 
JA signaling. 

To further resolve the components in the co- 
association network responsible for Empoasca 
susceptibility, we conducted a pairwise corre- 
lational analysis among the omics datasets and 
Empoasca abundance and leaf area damaged 
in the RILs of the MAGIC populations (Fig. 
1D). The putrescine-derived phenolamides 
and malonylated 17-HGL-DTGs were negatively 
correlated with the Empoasca numbers and 
damage, whereas glycosylated 17-HGL-DTG 
precursors were positively correlated. JA- 
related genes NaMYB8, NaLOX3, NaAOC, 
NaOPR3, NaJAR6, and NaWIPK exhibited 
the highest negative correlation scores with 
the Empoasca numbers and damage. There 
was considerable heterogeneity in the expres- 
sion of the JA-related family of JAZ genes, with 
the expression of NaJAZa, NaJAZd, NaJAZf, 
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a field-grown N. attenuata MAGIC population highlights deviations from 
canonical JA signaling for Empoasca leafhopper nonhost resistance. (A) Field 
plantation of the MAGIC RIL population of native N. attenuata tobacco plants in 
their native habitat at the WCCER field station in Arizona, USA. Native opportunistic 
Empoasca leafhopper communities and their feeding damage on leaves are 
illustrated. Leafhoppers attack these plants in a JA-dependent manner and 
preferentially select JA-deficient plants as hosts (images: R. Halitschke, D. Kessler, 
A. Kessler). (B) Schematic of high-throughput phenotyping of phytohormones 
(1816 samples), transcriptomes (350 samples), and metabolomes (1706 samples) 
and Empoasca phenotypes (1907 observations of Empoasca abundance and 
leaf damage) of 674 MAGIC RILs and parental lines in the field. To remove 
confounding factors caused by the stochastic nature of insect attack in nature, 
we mimicked herbivore feeding by immediately applying freshly collected oral 
secretions of M. sexta larvae to standardized puncture wounds in leaves at 
standardized leaf positions. This procedure, referred to as W + OS treatment, 


responses. The phytohormone and transcriptome datasets were collected 1 hour 
after W + OS treatment and the metabolome dataset after 72 hours. PCC, 
Pearson correlation coefficient. (©) Coassociation network built from correlations 
among metabolomes, transcriptomes, phytohormones, and SNPs (PCC cutoff, 
P < 0.05), and the top five most significant SNPs for each gene (small gray 
circles), metabolites, or phytohormones from eQTL or mQTL imputations were 
retained. Phytohormones and metabolites of different compound classes and 
JA-related genes are labeled with different colors. JA-lle, JA signaling genes 
(NaAOC, NaMYC2a, NaMYC2b, and NaJAZi), NaMYB8, and phenolamides 
(N-coumaroylputrescine, N-caffeoylputrescine, and N-feruloylputrescine) form 

a central cluster and are labeled. See fig. S3 for identities of the more peripheral 
nodes. ABA, abscisic acid; SA, salicylic acid. (D) Heatmap of coexpressions 
among metabolites, JA-related genes, and phytohormones with Empoasca 
phenotypes (numbers and area damaged) calculated as pairwise PCC (only 
significant correlations with P < 0.05 are shown with colors). 


NaJAZi, and NaJAZj negatively correlated 
with the Empoasca numbers and damage. 
The expression of JAZ genes known to me- 
diate M. sexta defense responses, such as 
NaJAZh (17), showed no significant correla- 
tions. JA, but not the canonical mediator of 
JA signaling, JA-Ile, was negatively correlated 
with the Empoasca numbers and damage, and 
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negative correlations were also observed for 
the hydroxylated and carboxylated JAs (OH-JA, 
OH-JA-Ile, and COOH-JA-Ile) and JA-valine 
conjugates (JA-Val). These results provided 
forward-genetics confirmation of the central 
role of JA signaling and pointed to down- 
stream components likely involved in the 


An elicited JA-JAZi module regulates 
Empoasca resistance 

To further disentangle the intricacies of the 
Empoasca-elicited JA signaling sector and 
its regulated downstream metabolic signa- 
tures responsible for Empoasca resistance, we 
adopted a reverse-genetics approach to exam- 


Empoasca leafhopper resistance response. 
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ine the involvement of phenolamides. Isogenic 
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lines of NV. attenuata plants individually RNA 
interference (RNAi)-silenced [inverted repeat 
(ir)] or overexpressed (ov) in different JAZ 
genes and NaMYC2 to evaluate JA signaling- 
deficiency; in NaMYB8, the phenolamide 
master TF regulator; and in DH29 and CV86, 
which catalyze spermidine conjugation steps 
in phenolamide biosynthesis, were screened 
in a glasshouse open-choice screening exper- 
iment using laboratory colonies of Empoasca 
decipiens (Fig. 2A and figs. S4 and $5). Nymphs 
and adult E. decipiens preferentially selected 
irMYC2, irMYB8, and ovJAZi plants for feed- 
ing and reproduction in contrast to the other 
transgenic lines, which were only slightly dam- 
aged by a few probing events (Fig. 2A). Diverse 
JAZ proteins allow the JA signaling cascade to 
regulate an array of metabolic and develop- 
mental traits in different tissues at different 
times (78, 19), thereby contextualizing responses 
and optimizing fitness. The modularity of the 
JA-JAZi sector may provide specific responses 
relevant to Empoasca leafhoppers. A tissue- 
wide transcriptomics analysis of all JAZ genes 
in the N. attenuata genome revealed that 
NaJAZi is highly expressed in flower tissues 
and is not responsive to M. sexta attack in 
leaves (20), whereas NaJAZh showed the op- 
posite pattern (Fig. 2B and fig. S6). These pat- 
terns were confirmed by exposing leaves to 
E. decipiens and M. sexta attack and moni- 
toring the kinetics of the JAZ transcript accu- 
mulations: M. sexta feeding elicited NaJAZh 
transcripts in leaves, whereas Empoasca feed- 
ing elicited NaJAZi transcript in leaves (Fig. 
2B). Yeast two-hybrid (Y2H) assays revealed 
that NaJAZi interacts with NaMYC2a, whereas 
NaMYB8 interacts with NaMYC2b (fig. $7). 
These data reveal that a sector of JA signaling 
involving MYC2, MYB8, and JAZi is engaged 
in Empoasca resistance in leaves. 

To identify the metabolites elicited by this 
JA sector, we reared either E. decipiens adults 
and nymphs or M. sexta larvae on leaves of 
rosette-stage plants of JA signaling-deficient 
transgenic lines (Fig. 2A) as well as on ird4OC 
and irCO// plants; on irGGPPS, which are de- 
ficient in 17-HGL-DTGs accumulations; and on 
irPMT plants, which are impaired in nicotine 
accumulations (fig. S8). We used an analyti- 
cal and computational workflow (14, 15, 21) 
to collect high-resolution indiscriminant (data- 
independent) tandem mass spectrometry (MS/ 
MS) spectra (termed idMS/MS) from extracts 
of Empoasca- and Manduca-damaged leaves. 
We quantified metabolome specialization (8j 
index), metabolome diversity (Hj index), and 
metabolic specificity of individual metabolites 
(Si index) using an information theory frame- 
work (21, 22). In the dimensions of information 
theory-processed metabolome specialization 
and diversity, M. sexta attack elicited overall 
higher metabolome plasticity, resulting in 
higher 6j scores than those elicited by attack 
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by E. decipiens. The different transgenic lines 
showed distinct trajectories of metabolome 
plasticity reprogrammed by the attack of the 
two insect species (Fig. 2C and fig. $9). 

Focusing on transgenic lines preferred by 
Empoasca, we noticed that the distinct signa- 
tures of metabolome specialization elicited 
by herbivore attack were weaker in irMYC2 
and irMYB8 plants (Fig. 2C and fig. S9). This 
suggested that MYC2 is a master regulator of 
metabolome plasticity in response to insect 
attack and that MYB8-dependent herbivory- 
induced phenolamides make up the metabolic 
sector responsible for the increases in metab- 
olome specialization. The separation of the 
trajectories of metabolome changes elicited by 
Empoasca and Manduca attack in the ovJAZi 
lines, rather than their abolishment, further 
pointed to a small set of metabolites elicited 
by Empoasca attack, regulated by NaJAZ, and 
potentially involved in Empoasca resistance. 

To identify these metabolites, we ranked Si 
scores for metabolite specificity calculated for 
each MS/MS spectrum from the E. decipiens- 
elicited metabolomes from the four transgenic 
lines and linked the Si scores with coexpres- 
sion heatmaps derived from correlations cal- 
culated among individual metabolites and 
Empoasca numbers and damage using the 
global variance generated from all reverse- 
genetics lines used in the feeding experiment 
(Fig. 2D and fig. S9). Phenolamides ranked at 
the top of the metabolic specificity Si scores, 
with the putrescine-derived metabolites among 
the highest, and these were negatively corre- 
lated with Empoasca numbers and damage, in 
contrast to particular 17-HGL-DTGs, quinate 
conjugates, and nicotinic acid, which showed 
positive correlations (Fig. 2D). 

The putrescine-derived phenolamides, CoP, 
CP, and FP, were reduced in irAOC, irCOl, 
irMYC2, and irMYB8 lines and selectively de- 
creased in ovJAZ plants damaged by Empoasca 
feeding but not in those damaged by Manduca 
feeding, whereas other spermidine-derived 
phenolamides showed similar responses to 
the attacks of the two herbivore species in 
ovJAZi plants (fig. S10). The spermidine-derived 
metabolites could be excluded as mediators 
of Empoasca nonhost resistance on the basis 
of the lack of responses of leafhoppers to the 
irDH29 and irCV86 lines (Fig. 2A). To further 
explore the involvement of CoP, CP, and FP, we 
conducted in vivo Empoasca choice assays by 
individually infiltrating physiologically rele- 
vant concentrations of synthetic CoP (7 uM), CP 
(100 uM), and FP (10 uM) into leaves of irMYC2 
plants that are devoid of elicited phenolamides 
(Fig. 2E and fig. S11). However, these infiltra- 
tions did not alter the preference of Empoasca 
for irMYC2 plants. In vitro Empoasca direct 
feeding assays conducted with individual 
compounds at physiologically relevant con- 
centrations in glucose solutions revealed no 
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significant changes in mortality rates of 
EF. decipiens compared with those fed on 
glucose controls (Fig. 2E). These data suggest 
that CoP, CP, and FP were not directly respon- 
sible for Empoasca resistance and that other 
yet-unknown putrescine-derived phenolamide 
metabolites were responsible. 


Multi-omics reveals the defense and its 
three-pronged pathway 


Leaves of herbivore-attacked N. attenuata 
plants grown in the glasshouse accumulate a 
variety of putrescine- and spermidine-derived 
phenolamides (14, 16). We selected 15 RILs 
from the field-based multi-omics dataset of 
the MAGIC population (Fig. 1) that accu- 
mulated high levels of structurally diverse 
OS-induced phenolamides to construct idMS/ 
MS and identify the structures of putrescine- 
derived phenolamides. This effort resulted in 
518 nonredundant idMS/MS spectra (Fig. 3A). 
We performed a biclustering analysis to cluster 
spectra according to fragment [normalized dot 
product (NDP)] and neutral loss (NL)-based 
similarities, which resulted in seven modules 
(Fig. 3A). Module 5, particularly enriched in 
phenolamide-related compounds containing 
caffeoyl or putrescine moieties, was further 
mapped onto a molecular network (Fig. 3A). 
An unknown compound at mass/charge ratio 
(m/z) 347.196 (IM+H]", CijgHo7N204") occupied 
the first layer of directly linked network neigh- 
bors for the two isomers of CP (m/z 251.14) 
because of their shared neutral losses of pu- 
trescine of A88.10 Da and fragment peak at 
m/z 163.04 (CogH,O3") corresponding to the 
caffeoyl moiety (Fig. 3A and fig. S12). The 
idMS/MS for a fragment peak at m/z 259.09 
(C,sH,;04"), which resulted from the loss of 
putrescine of the molecular ion, further frag- 
mented to m/z 163.04 with a neutral loss of 
96.055 Da (CgH,O). This implied that the un- 
known m/z 347.19 is a CP derivative decorated 
with a CgH,O residue on the aromatic ring 
of the caffeoyl moiety (fig. S12), which had 
previously been associated with JA signaling 
in natural accessions of N. attenuata (14). 
To test whether the unknown m/z 347.19 
metabolite is regulated by the specific JA-JAZi 
module, we explored the coassociation network 
and conducted coexpression analyses for in- 
duced m/z 347.19 against Empoasca num- 
bers and damage and JAs in the field-planted 
MAGIC population (Fig. 1). The m/z 347.19 was 
negatively correlated with Empoasca damage 
but positively correlated with JA, JA-Ile, and 
JA-Val (fig. S13). We then mined the Empoasca- 
induced metabolomes of JA-deficient transgenic 
lines (Fig. 2). However, with similar computa- 
tional workflows, we were unable to identify 
this compound in the dataset described earlier 
(Fig. 2). Extensive experimentation revealed 
that field-planted RILs elicited more of this 
unknown compound than glasshouse-grown 
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Fig. 2. Reverse genetics coupled with information theory—based unbiased 
metabolomics reveal an Empoasca-elicited JA-JAZi module regulating induced 
unknown putrescine-containing phenolamides correlated with Empoasca 
nonhost resistance. (A) Phenotypes of Empoasca-damaged leaves on transgenic 
N. attenuata lines. Plants (n = 10) were randomly placed in an open-choice glasshouse 
environment containing Empoasca leafhoppers (fig. S4). Representative leaves are 
shown for each genotype, with insets highlighting the damage of owWJAZi and irMYB8 
lines. Different letters indicate significant differences [P < 0.001, one-way analysis of 
variance (ANOVA) followed by Tukey's post hoc multiple comparisons]. (B) Tissue- 
specific expression profiles of NaJAZh and NaJAZi in N. attenuata WT (left); heatmap 
coloring depicts the Z-score-scaled transcripts per million (TPM). Kinetics of 
relative transcript accumulations of NaJAZh and NaJAZi in leaves of N. attenuata 

(n = 3) in response to continuous E. decipiens and M. sexta feeding with samples 
harvested at 0 to 24 hours after the start of feeding. FLB, flower bud; STI, stigma; COE, 
corolla early; PED, pedicel; SED, seed; OFL, opening flower; SNP, style; LEC, leaf 
control; ROT, roots from OS-treated plants; OVA, ovary; NEC, nectary; ANT, anther; 


16 transgenic lines (see fig. S9 for the complete dataset). An increase in metabolome 
specialization (8)) indicates that, on average, more herbivory or genotype-specific 
metabolites are produced, whereas an increase in metabolome diversity (Hj) indicates 
that either qualitatively more metabolites are produced or that quantitatively 
the global metabolic frequency profile is more uniformly distributed. Colors denote 
different insects, and symbols denote different treatments—triangles indicate insect 
feeding, and circles indicate untreated controls. (D) A ranked metabolite specificity 
(Si) index distribution plot was calculated for each metabolite on the basis of 
E. decipiens specifically elicited metabolomes from the four transgenic lines shown in 
(C) (left) and further linked with a PCC coexpression heatmap among metabolites 
and E. decipiens number and damage phenotypes (right) for which the data from all 
reverse-genetics lines shown in (A) were used to enhance the statistical power of 
the PCC calculations. Dots are colored on the basis of compound class annotations. 
Only significant correlations with P < 0.05 are shown with colors. (E) In vivo Empoasca 
choice assays (n = 8) conducted by infiltrating synthetic CoP, CP, or FP diluted in 0.1% 
dimethyl! sulfoxide (DMSO) solutions into irMYC2 leaves (top) and in vitro Empoasca 


STT, stem treated; LET, leaf treated. (C) Scatterplots of specialization (5j) versus 
diversity (Hj) indices for specialized metabolomes of leaves (n = 4) after 72 hours of 
feeding by E. decipiens and M. sexta on four transgenic lines selected from a set of 


Bai et al., Science 375, eabm2948 (2022) 4 February 2022 


nonchoice assays (n = 3, 25 Empoasca leafhoppers per replicate) by feeding Empoasca 
with synthetic CoP, CP, or FP diluted in 10% glucose solutions (bottom) revealed that 
these phenolamides were not affecting leafhopper behavior or performance. 
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Fig. 3. Elucidating an herbivory-elicited GLV-caffeoylputrescine metabolite 

and its three-pronged biosynthetic pathways by combining MS/MS structural 
metabolomics with forward and reverse genetics. (A) (Left) Biclustering 

of 518 idMS/MS spectra constructed from 15 RILs of the field-planted MAGIC 
population based on shared fragments (NDP-based similarity) and shared 
neutral losses (NL-based similarity) reveals seven distinct modules (M1 to M7) in 
the molecular network. (Right) Close-up of module 5 in which putrescine- or 
caffeoyl-derived phenolamides are enriched harboring an unknown metabolite 
m/z 347.19 that is directly linked to two isomers of CP (circled in green). 

CS, N-caffeoylspermidine; CoCS, N',N'"-coumaroyl, caffeoylspermidine; CFS, 
N',N"-caffeoyl, feruloylspermidine; DCS, N',N''-dicaffeoylspermidine; CPD, 
caffeoylputrescine dimer; CGA, chlorogenic acid; Unk., unknown. (B) Accumulations 
of m/z 347.19 in Empoasca-elicited EV and irMYC2 lines (top) and MeJA-induced 
EV, irMYC2, irMYB8, and ovJAZi lines (bottom). (C) (Left) Manhattan plot for 
herbivory-induced unknown m/z 347.19 from an MQTL analysis of W + OS-elicited 
leaves from the MAGIC RIL population grown in the glasshouse and extracted 
with procedures that minimize losses of the phenolamide sector. Core JA signaling 
gene, NaMYC2a; phenolamide regulator, NaMYB8; CP biosynthetic gene, NaAT1; 
and two unknown biosynthetic candidate genes, NaPPO1 and NaPPO2, were 


0) 
EV asHPL irLOX2 irLOX2xLOX3 


imputed in the mQTL analysis (P value cutoffs = 10-) as well as an uncharacterized 
candidate gene, NaBBL2 (P = 0.0013). (Right) Gene coexpression network 
constructed using a previously published microarray dataset of irMYB8 plants 
harvested 1 and 5 hours after W + OS elicitation. The phenolamide biosynthetic 
genes, NaAT1 and NaDH29, were used as baits (diamonds). Yellow dots depict 
genes coexpressed with both baits, whereas the green (NaAT1) and blue (NaDH29) 
dots depict genes coexpressed with a single bait. (D) VIGS of biosynthetic gene 
candidates involved in m/z 347.19 production. Silencing NaPPOI, NaPPO2, 
NaAT1, and NaBBL2 expression abolished the elicitation of m/z 347.19 by W + OS 
treatment observed in EV control plants (C indicates untreated controls). 

(E) Proposed three-pronged biosynthetic pathway for the Empoasca-elicited 

m/z 347.19 production, which requires the LOX2-HPL-dependent C6 GLV 
metabolism, LOX3-dependent and JA-regulated phenylpropanoid metabolism, 
and polyamine metabolism, the outputs of which are putatively conjugated 

in NaBBL2- and NaPPO1/2-dependent reactions. (F) Scatterplots of metabolite 
abundance of m/z 347.19 against (Z)-3-hexen-1-ol volatile emissions in MAGIC 
accessions from the glasshouse (top) and m/z 347.19 accumulation after 

W + OS treatment in leaves of stably transformed EV, asHPL, irLOX2, and irLOX2 
or irLOX3 lines (bottom). 


RILs (data $2) and that seemingly minor 
differences in leaf sampling and extraction 
technique—potentially the simultaneous ex- 
posure of leaves to aluminum foil and liquid 
N, and unfavorable pH conditions (table S1I)— 
resulted in the loss of this unstable phenol- 
amide in plants grown and sampled under 
glasshouse conditions. 

We reared Empoasca on irMYC2 plants again 
and, by optimizing extraction conditions, found 
that Empoasca feeding strongly elicited m/z 
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347.19 accumulations in empty vector (EV)- 
transformed plants; these accumulations were 
abolished in irMYC2 lines (Fig. 3B). During 
this extraction optimization effort, we real- 
ized that the Empoasca leafhopper elicitation 
procedure could be replaced with the experi- 
mentally more tractable elicitations by larval 
oral secretions or methyl jasmonate (MeJA). 
Consistently, MeJA-induced production of m/z 
347.19 was hampered in irMYC2, but also in 
irMYB8 and ovwJAZ lines (Fig. 3B). These re- 
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sults revealed that the unknown m/z 347.19 is 
regulated by the JA-JAZi-MYC2-MYB8 signaling 
sector, which is likely responsible for Empoasca 
resistance. 

To investigate the biosynthetic origins of 
m/z 347.19, we extracted OS-elicited leaves 
of the entire MAGIC RIL population grown 
under glasshouse conditions and phenolamide- 
permissive conditions and conducted an mQTL 
analysis (Fig. 3C). The analysis imputed a series 
of genes (with P values <10~*) known to be 
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involved in the regulation and biosynthesis of 
CP and m/z 347.19, including NaMYC2a, NaMYB8, 
and Na@AT], which encodes a hydroxycinnamoyl- 
coenzyme A:putrescine acyltransferase re- 
sponsible for CP biosynthesis (76), as well as 
two polyphenol oxidases (PPOs), NaPPOI and 
NaPPO2, which are located on chromosomes 7 
and 8, respectively (Fig. 3C). Coexpression 
analyses of a microarray dataset of irMYB8 
lines (23) using NaATI and NaDH29 as baits 
revealed a cluster of genes in the NaATI group- 
ings known to be involved in the biosynthesis 
of CP, including NaPALI, NaPAL2, Na4CLi, 
and NaC3H. A berberine bridge enzyme-like 
(BBL) gene, NaBBL2, was highly coexpressed 
with NaATI and decreased in its induced ex- 
pressions in irMYB8 lines (Fig. 3C and fig. 
$14A). We revisited the mQTL dataset and 
found that NaBBL2, located on chromosome 3, 
was associated with m/z 347.19 (Fig. 3C), al- 
beit at reduced statistical significance (P = 
0.0013). Time-resolved microarray data (24) 
of herbivory-elicited expression of NaAT1, 
NaPPO2, and NaBBL2 revealed that NaAT1 
and NaPPO2 showed similar induction pat- 
terns in N. attenuata wild type (WT), whereas 
NaBBL2 was highly induced at 1 hour and 
retained its induction at later time points, 
albeit at reduced levels (fig. S14B). Moreover, the 
herbivory-elicited inductions of NaATI, NaPPO1, 
NaPPO2, and NaBBL2 were reduced in an RNA 
sequencing (RNA-seq) transcriptome dataset of 
irMYC2 lines (fig. S14C). 

Although the number and order of the bio- 
synthetic steps for m/z 347.19 accumulations 
remained elusive, we hypothesized that pos- 
sible oxidation and acylation reactions were 
likely required. We therefore focused on oxi- 
dases and acyltransferases collectively im- 
puted from the multi-omics analysis. Candidate 
genes included three acyltransferases, NaAT1, 
NaAT2, and NaAT3; three polyphenol oxidases, 
NaPPO1, NaPPO2, and NaPPO3; and a BBL 
gene, NaBBL2. We evaluated the in vivo func- 
tions of the candidate genes as well as NaAT1 
as a positive control by silencing their expres- 
sion in N. attenuata using virus-induced gene 
silencing (VIGS). Consistent with a previous 
analysis, VIGS of NaATI abolished m/z 347.19 
accumulation (J4). Similarly, silencing NaPPOI, 
NaPPO2, and NaBBL2 expression also trun- 
cated the elicitation of m/z 347.19 (Fig. 3D and 
fig. S15). Untargeted metabolomics analysis 
of the VIGS-silenced plants revealed that 
silencing NaPPOI and NaPPO2 truncated 
m/z 347.19 accumulations without changes 
in other phenolamides (Fig. 3D and fig. S16). 
These results suggest that NaPPO1, NaPPO2, 
and NaBBL2 are required for the in vivo pro- 
duction of m/z 347.19. 

Previous analysis has suggested that the 
additional CgH,O residue of m/z 347.19 is 
produced from the fatty acid oxylipin cascade, 
which converts C18 polyunsaturated fatty acids 
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released from biological membranes during 
stresses, wounding, and herbivory (25) to pro- 
duce green leaf volatiles (GLVs) enriched in 
reactive C6 derivatives (14). We measured 
herbivory-elicited GLVs in the same glasshouse- 
grown MAGIC RIL population used for the im- 
putation (Fig. 3) and conducted correlational 
analysis with herbivory-elicited m/z 347.19 ac- 
cumulations. (Z)-3-hexenal-derived volatiles, 
such as (Z)-3-hexenyl-propanoate and (Z)-3- 
hexenol, were the most significantly positively 
correlated metabolites with m/z 347.19, where- 
as 1-hexanol, linalool, and other elicited vola- 
tiles were not (Fig. 3F and fig. S17). 

The C6 aldehydes, with their molecular for- 
mula of CgHgO, are the most reactive alde- 
hydes produced from the GLV pathway and 
have been hypothesized (/4) to be the missing 
substrates for the biosynthesis of ™/z 347.19. 
Consistent with previous analyses (26), stably 
silencing LIPOXYGENASE2 (irLOX2) in 
N. attenuata, which controls the first committed 
step in the GLV pathway, abolishes C6 alde- 
hydes production and total GLV emissions 
(14), and stably silenced crosses of irLOX2 
and irLOX3 (irLOX2xirLOX3) completely elim- 
inated m/z 347.19 production (Fig. 3F). Addi- 
tionally, silencing NaHPL (with an antisense 
construct, asHPL)—which catalyzes the for- 
mation of the initial C6 GLV product, (Z)-3- 
hexenal, and its isomer, (£)-2-hexenal—results 
in considerable time-dependent reductions 
of GLVs in N. attenuata (27) and reduced m/z 
347.19 accumulations to ~% of the WT levels 
(Fig. 3F). From these results, we surmised that 
in response to Empoasca probing, m/z 347.19 
is produced by a three-pronged metabolic 
pathway composed of the LOX2-HPL-GLV 
pathway, the LOX3-JA-regulated phenylpro- 
panoid pathway, and the polyamine pathway, 
which are condensed by NaPPO1, NaPPO2, 
and NaBBL2 using CP and (Z)-3-hexenal or 
(E)-2-hexenal to produce m/z 347.19 (Fig. 3E). 

To test this hypothesis, we isolated purified 
NaPPO1, NaPPO2, and NaBBL2 proteins with 
N-terminal hexahistidine tags after expres- 
sion in Escherichia coli (fig. S18). Incubation 
of either NaPPO1 or NaPPO2 (NaPPO1/2) with 
CP and (Z)-3-hexenal yielded a m/z 347.19 peak 
with a MS/MS spectrum and retention time 
identical to that of the m/z 347.19 induced in 
N. attenuata leaves (fig. S19). Additionally, 
by-products of doubly charged CP dimers at 
m/z 250.13 ((M+2H]}*, CogH3gN4067"), which 
are not detected in OS-induced leaves of 
N. attenuata WT (data S2), were also produced 
in vitro (Fig. 4A and fig. S20). These doubly 
charged CP dimers were only produced when 
the quantities of the (Z)-3-hexenal substrate 
were lower than those of the CP substrate 
in vitro. NaPPO1/2 showed little-to-no activ- 
ity when incubated with CP and (£)-2-hexenal 
(Fig. 4A). NaBBL2 alone could not use CP 
and (Z)-3-hexenal as the substrates to pro- 
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duce m/z 347.19, and under in vitro conditions, 
the addition of NaBBL2 in the presence of 
NaPPO1/2, CP, and (Z)-3-hexenal did not signif- 
icantly increase the production of m/z 347.19 
(figs. S21 and S22 and table. S2). As PPOs have 
broad substrate specificities that can accept both 
hydroxy benzenes and/or ortho-dihydroxylated 
benzenes as substrates (28), we further ex- 
plored the substrate specificities of NaPPO1 
and NaPPO2 by incubating NaPPO1/2 and 
(Z)-3-hexenal with CoP or with chlorogenic 
acid (CGA), which contains the same aromatic 
dihydroxylation pattern as CP. However, no 
new products were found, which indicates that 
CoP and CGA are not accepted as substrates by 
NaPPO1/2 (fig. S23). Together, these data reveal 
that NaPPO1/2 accepts CP and (Z)-3-hexenal as 
substrates to produce m/z 347.19 in vitro. 


Biosynthetic logic of the reactive 
m/z 347.19 chemistry 


To elucidate the chemical structure of m/z 
347.19, we attempted to isolate and purify 
the m/z 347.19 using induced N. attenuata leaf 
material and enzyme assay-derived products. 
However, several attempts failed because of the 
instability of m/z 347.19. Although relatively 
stable in ammonium-acetate buffer (pH 4.8), 
when concentrated either by rotatory evap- 
oration or freeze drying, m/z 347.19 rapidly 
decomposed (table S1). These observations 
indicate that m/z 347.19 is reactive and un- 
stable at high pH. We modified the purification 
procedures for m/z 347.19 to produce large 
quantities from enzymatic assays under weak 
acidic conditions and purified m/z 347.19 using 
solid-phase extraction under argon atmo- 
spheres. The purified m/z 347.19 was then sub- 
jected to nuclear magnetic resonance (NMR) 
analysis, which elucidated its structure as 
a CP-5-(Z)-3-hexenal compound (hereafter 
referred to as CPH) (fig. S24 and data S3). 
CPH’s half-life is only ~22 hours in acidified 
methanol-ds (0.1% formic acid) at room tem- 
perature in darkness (fig. S25 and data S3). 
CPH contains both the reactive moiety of an 
a,B-unsaturated aldehyde derived from (Z)-3- 
hexenal, which is electrophilic, and an amine 
feature of CP, which is nucleophilic. 

CPH results from the biochemical union of 
so-called direct (CP) and indirect [(Z)-3-hexenal] 
defense metabolism, and we hypothesized 
that CPH was the metabolic trait underlying 
Empoasca nonhost resistance. We suggest 
two possible mechanisms of action: The rapid 
polymerization of the electrophilic and nucle- 
ophilic groups could occlude the mouthparts 
of probing Empoasca leafhoppers, or the 
a,B-unsaturated aldehyde may function as 
a protein cross-linker that disables Empoasca 
proteins (29). We propose a three-step biosyn- 
thetic mechanism for the production of CPH— 
NaPPO1/2 oxidizes CP to the corresponding 
caffeoyl quinone derivative and activates (Z)-3- 
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Fig. 4. Structural elucida- 
tion, biosynthesis, 
function, and engineering 
of m/z 347.19 in vitro 

and in planta. (A) In vitro 
enzymatic assays for 

m/z 347.19 production 
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347.19 by expressing NaPPO1, NaPPO2, and NaBBL2 with Z3H infiltrations in MeJA-elicited S. chilense leaves. m/z 347.19 accumulates only in S. chilense expressing 
NaPPO1/2 together with NaBBL2, but not NaPPO1/2 alone. V. faba can be engineered to produce m/z 347.19 by expressing NaPPO1, NaPPO2, and NaBBL2 with 
Z3H and CP infiltrations in leaves. Also shown are mortality rates of in vivo Empoasca feeding assays (n = 4, 25 Empoasca leafhoppers per replicate) in which 


Empoasca were fed for 10 hours on leaves of reconsti 


hexenal for a Michael addition reaction, the 
product of which is aromatized to form CPH 
(Fig. 4B). 


CPH is responsible for Empoasca resistance 


To test whether CPH is responsible for Empoasca 
resistance, we fed E. decipiens with physiolog- 
ically relevant concentrations of 1 uM (esti- 
mated from field-collected elicited leaves) of 
NMB-confirmed CPH in diets containing 10% 
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tuted S. chilense and V. faba, respectively. 


glucose in vitro. After 6 hours of feeding, the 
CPH treatment caused almost 100% mortal- 
ity of E. decipiens, in contrast to leafhopper 
growth on control diets (P = 3 x 10-*; Student’s 
t test) (Fig. 4C). We further silenced NaATI 
expression in N. attenuata plants using VIGS, 
which disrupted the production of both CP 
and CPH (/4) (Fig. 3D). In vivo choice assays 
revealed that NaAT7-silenced plants received 
significantly more Empoasca damage and 
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higher Empoasca numbers than EV plants 
(Fig. 4C). Similarly, silencing either NaPPOI 
or NaPPO2 in plants abolished only the accu- 
mulation of CPH in plants, without significant 
alterations in other phenolamide pools (Fig. 3D 
and fig. S16), and resulted in a clear Empoasca 
feeding preference (P = 0.013 and P = 0.022, 
respectively) and greater Empoasca damage 
(P = 0.008 and P = 0.008, respectively) com- 
pared with that observed in EV plants (Fig. 4C). 
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Together, these in vitro and in vivo results 
suggest that CPH in NV. attenuata is responsible 
for the plant’s Empoasca nonhost resistance. 


NaBBL2 is required to engineer CPH 
biosynthesis in crop species 


The discovery of CPH and its biosynthetic 
pathways underlying nonhost resistance offers 
a framework for the engineering of CPH bio- 
synthesis in crop plants as a means of opti- 
mizing a plant’s endogenous metabolism for 
defense against the attack of devastating 
leafhopper pests, the diseases they vector, 
and other nonhost pests. We investigated 
whether CPH is widely found in Solanaceae 
and other plant taxa. Metabolic profiling of 
N. attenuata’s close relatives revealed that 
six of seven Nicotiana species induced CPH in 
a coordinated fashion with CP when elicited 
by MeJA (fig. S26A). We selected 13 different 
taxa from different plant families, including 
several crop species, and compared amino acid 
identities of NaAT1, NaPPO1, NaPPO2, and 
NaBBL?2 with their closest homologs using 
Basic Local Alignment Search Tool (BLAST) 
searches of the National Center for Biotech- 
nology Information (NCBI) sequence database 
(fig. S26B). Five Solanaceae taxa of the 13 spe- 
cies examined contain all orthologs of the four 
protein sequences. Moreover, we detected 
MeJA-induced CP in eight species, including 
seven Solanaceae species and wheat, where- 
as only two species other than N. attenuata, 
Capsicum annuum and Nicotiana benthamiana, 
produced CPH (fig. S26B). These results sug- 
gest that CPH production may be restricted to 
the Solanaceae. 

Synthetic biology has enabled the transfer 
of metabolic pathways among taxa because 
of shared cofactors and metabolism (30-33). 
We attempted to reconstitute the CPH pathway 
in vivo (Fig. 4D). We selected Vicia faba and 
Solanum chilense for Agrobacterium-mediated 
transient expression of the CPH pathway for 
several reasons. Neither species accumulated 
CPH in untreated and MeJA-treated tissues. 
V. faba is an ideal host plant for Empoasca 
rearing. CoP, CP, and FP do not accumulate in 
V. faba, whereas CP levels are induced by 
MeJA treatment of S. chilense, which provides 
an internal precursor for CPH production (Fig. 
4E). Moreover, both are readily transformed 
and likely to produce correctly folded active 
proteins with which to test biochemical ac- 
tivities of CPH biosynthetic genes in planta. 

We transiently coexpressed NaPPOI or 
NaPPO2 together with (Z)-3-hexenal and CP 
leaf infiltrations in V. faba or without CP infil- 
trations in S. chilense. However, we failed to 
detect any CPH in either species (Fig. 4E). 
PPOs are generally localized to plastids, phys- 
ically separating them from their phenolic 
substrates, which are known to be localized 
in vacuoles (34). In N. attenuata, a thylakoid 
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transfer domain was identified in both NaPPO1 
and NaPPO2 N-terminal sequences (fig. S27A). 
Transient expression of green fluorescent pro- 
tein (GFP)-tagged NaPPOI1 and NaPPO2 in 
N. attenuata leaves confirmed that both NaPPO1 
and NaPPO2 are plastid localized (fig. S2’7B). 

Our three-pronged pathway proposal for 
CPH biosynthesis is therefore challenged by 
the separate enzymatic localizations of the 
different components—CP (likely vacuolar or 
cytosolic) (34, 35), GLVs, JAs, and PPOs (plas- 
tidial) (34, 36). This challenge was reminiscent 
of nicotine biosynthesis, which requires a 
BBL gene to join the mitochondria-localized 
pyridine ring, derived from nicotinic acid, 
with the pyrrolidine ring, derived from the 
peroxisome-localized N-methylpyrrolinium 
cation, to produce nicotine (37). We hypothe- 
sized that NaBBL2 is required for the produc- 
tion of CPH in vivo, and to test this hypothesis, 
we expressed NaBBL2 along with NaPPO1 or 
NaPPO2 in S. chilense plants. One day after 
Agrobacterium infiltration, S. chilense plants 
were treated with MeJA to induce CP produc- 
tion, and 3 days later, we infiltrated leaves with 
(Z)-3-hexenal. After 6 hours, we harvested leaves 
for liquid chromatography-mass spectrometry 
(LC-MS) analysis and found that the leaves 
had accumulated substantial quantities of CPH 
(Fig. 4E). For V. faba plants, we expressed 
NaBBL2 along with NaPPO1 or NaPPO2; 3 days 
after Agrobacterium infiltration, we infiltrated 
leaves with both CP and (Z)-3-hexenal and har- 
vested leaves for LC-MS analysis after 6 hours. 
Again, CPH accumulated (Fig. 4E). From these 
results, we infer that NaBBL2, although not 
required for in vitro synthesis, is required for 
in vivo CPH biosynthesis. Additional work is 
required to evaluate whether NaBBL2 plays a 
role in solving the localization challenge, which 
could have other possible solutions (figs. S28 
and $29 and supplementary materials). Finally, 
we conducted Empoasca feeding trials on 
the CPH-engineered V. faba and S. chilense 
plants and observed that these Empoasca 
host crop plants become lethal host plants 
for Empoasca (Fig. 4E). 

This mechanistic analysis of Empoasca non- 
host resistance provides another example of 
the innovative chemical solutions that native 
plants have evolved to solve their ecological 
challenges (38). The natural history-driven 
multi-omics framework that we used for the 
discovery of CPH and its marriage with syn- 
thetic biology approaches highlights how read- 
ily the results of millions of years of innovation 
by natural selection can be transferred to our 
crop plants to catalyze the next, greener, and 
ecologically more nuanced revolution in plant 
protection (39) and domestication (40-42). 
Crop plants face challenges not substantially 
different from those of native plants, being 
constantly tested by an herbivore community 
that challenges the host-nonhost distinction. 
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In a world of climate change and globally ho- 
mogenized herbivore communities, opportun- 
istic associations will dominate natural and 
man-made ecosystems. Insight into how native 
plants cope with opportunistic associations will 
help us to design crops that are more resilient 
in the face of unknown stresses as the world’s 
climate changes (43). 


Materials and methods summary 


Two replicates of the 650 RILs from a 26-parent 
MAGIC population and their 26 parental lines 
were planted at the WCCER field station in 
Prescott, Arizona, USA. To elicit a standardized 
herbivory response, leaves of all RILs, which 
were in the early flowering stage, were wounded 
and immediately treated with diluted M. sexta 
oral secretions (W + OS) or were left untreated 
(control). Leaves were then harvested on dry 
ice at 1 and 72 hours. One week after metab- 
olite sampling, all plants of the field popu- 
lation were screened for natural Empoasca 
leafhopper numbers and damage. These leaf- 
hoppers had opportunistically sampled the 
N. attenuata plants from neighboring native 
cucumber host plants. The mQTL and eQTL 
mapping between SNPs and the relative abun- 
dance of each compound or transcript using a 
set of 646 RILs of the MAGIC population was 
done with the R package software GAPIT using 
general linear models (GLMs). The multi-omics 
coassociation network was built from corre- 
lations among metabolomes, transcriptomes, 
phytohormones, and SNPs. For Empoasca 
choice assays, transgenic lines of N. attenuata 
at the early rosette growth stage were ran- 
domly placed in an open-choice glasshouse 
environment containing Empoasca leafhoppers 
reared on bean plants in the MPI-CE glass- 
house in Isserstedt, Germany. Y2H assays 
and quantitative reverse transcription poly- 
merase chain reaction (qRT-PCR) were used 
for characterizing Empoasca-induced JA sig- 
naling genes. Compound-specific idMS/MS 
was constructed using ultrahigh-performance 
liquid chromatography-electrospray ionization 
(UHPLC-ESI)-quadrupole time-of-flight mass 
spectrometry (qTOF-MS) for idMS/MS acquisi- 
tion and rule-based computational approaches 
for idMS/MS assembly. Metabolome diversity 
and specialization and metabolic specificity 
were calculated using information theory by 
considering the Shannon entropy of the idMS/ 
MS frequency distributions. In vivo Empoasca 
choice and in vitro Empoasca feeding assays 
were conducted by infiltrating synthetic CP, 
CoP, or FP into irMYC2 leaves or by feeding 
Empoasca with the compounds diluted in 
10% glucose solutions. Fifteen RILs, which in- 
duced putrescine-containing phenolamides 
after OS elicitation and accumulated a di- 
verse set of known and unknown phenol- 
amides, were used to construct idMS/MS for 
MS/MS structural metabolomics analysis. 
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MS/MS similarity scoring, biclustering, and 
molecular networking were used to iden- 
tify the unknown ™/z 347.19 metabolite. OS- 
induced volatile emissions were collected 
using polydimethylsiloxane (PDMS) tubing 
from 650 MAGIC RILs planted in the main 
MPI-CE glasshouse and analyzed by thermal 
desorption-gas chromatography-mass spec- 
trometry (TD-GC-MS). NaPPOI/2 and NaBBL2 
genes were elucidated by combining mQTL 
analysis for herbivory-induced unknown 
m/z 347.19 and transcriptomics analysis of 
the microarray and RNA-seq datasets of 
OS-induced kinetics of WT and irMYC2 and 
irMYB8 lines. The candidate genes were func- 
tionally validated by VIGS and in vitro enzy- 
matic assays using FE. coli-expressed NaPPO1, 
NaPPO2, and NaBBL2 with CP and (Z)-3- 
hexenal. The CPH [CP-5-(Z)-3-hexenal] chem- 
ical structure was characterized by NMR. CPH’s 
resistance function against Empoasca was 
tested by in vitro nonchoice assays with syn- 
thesized CPH or by in planta choice assays 
conducted with VIGS plants of EV, NaPPOI, 
NaPPO2, and NaAT]I. The biosynthetic path- 
way of CPH was reconstituted in V. faba and 
S. chilense by transiently coexpressing NaPPO1, 
NaPPO2, and NaBBL2 with CP and (Z)-3- 
hexenal leaf infiltrations. The nonhost resist- 
ance function of CPH against Empoasca was 
further evaluated with the CPH-engineered 
V. faba and S. chilense plants. 
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Critical assessment of DNA adenine methylation in 
eukaryotes using quantitative deconvolution 


Yimeng Kong’, Lei Cao'}, Gintaras Deikus'}, Yu Fan';, Edward A. Mead?t, Weiyi Lai?, Yizhou Zhang?, 
Raymund Yong®, Robert Sebra’*°, Hailin Wang”, Xue-Song Zhang®, Gang Fang’* 


The discovery of N°-methyldeoxyadenine (6mA) across eukaryotes led to a search for additional 
epigenetic mechanisms. However, some studies have highlighted confounding factors that challenge 
the prevalence of 6mA in eukaryotes. We developed a metagenomic method to quantitatively 
deconvolve 6mA events from a genomic DNA sample into species of interest, genomic regions, 

and sources of contamination. Applying this method, we observed high-resolution 6mA deposition in 
two protozoa. We found that commensal or soil bacteria explained the vast majority of 6mA in 
insect and plant samples. We found no evidence of high abundance of 6mA in Drosophila, 
Arabidopsis, or humans. Plasmids used for genetic manipulation, even those from Dam 
methyltransferase mutant Escherichia coli, could carry abundant 6mA, confounding the evaluation 
of candidate 6mA methyltransferases and demethylases. On the basis of this work, we advocate 


for a reassessment of 6mA in eukaryotes. 


or decades, N®*-methyldeoxyadenine (6mA) 

has been known to be widespread in 

prokaryotes as a regulator of DNA rep- 

lication, repair, and transcription (/-3). 

Recently, 6mA has also been reported 
to be prevalent in eukaryotes. Unlike the gen- 
erally high abundance of 6mA in bacteria, 
6mA/A levels (6mA events relative to all ade- 
nines) in eukaryotic organisms vary over several 
orders of magnitude (4-73). A few unicellular 
organisms have very high 6mA/A levels: 0.4% 
in Chlamydomonas reinhardtii (4), 0.66% in 
Tetrahymena thermophila (5), and as much as 
2.8% in early-diverging fungi (6). In contrast, 
6mA/A levels reported in multicellular eu- 
karyotes are much lower: ~0.1% to ~0.0001%, 
or undetectable (8, 10-12, 14, 15). Nonetheless, 
important functions have been assigned to 
6mA in eukaryotes, suggesting additional epi- 
genetic mechanisms in basic biology and hu- 
man diseases (17). However, other studies have 
cast doubt on the existence and levels of 6mA 
in eukaryotic DNA (15-79). For example, liquid 
chromatography coupled with tandem mass 
spectrometry (LC-MS/MS) can reliably quantify 
6mA with high sensitivity, but it cannot dis- 


‘Department of Genetics and Genomic Sciences and Icahn 
Institute for Genomics and Multiscale Biology, Icahn School of 
Medicine at Mount Sinai, New York, NY 10029, USA. State 
Key Laboratory of Environmental Chemistry and 
Ecotoxicology, Research Center for Eco-Environmental 
Sciences, Chinese Academy of Sciences, Beijing 100085, 
China. “Department of Neurosurgery and Oncological 
Sciences, Icahn School of Medicine at Mount Sinai, New York, 
NY 10029, USA. “Black Family Stem Cell Institute, Icahn 
School of Medicine at Mount Sinai, New York, NY 10029, USA. 
°Sema4, a Mount Sinai Venture, Stamford, CT 06902, USA. 
5Center for Advanced Biotechnology and Medicine, Rutgers 
University, New Brunswick, NJ 08854, USA. 

*Corresponding author. Email: gang.fang@mssm.edu 

tThese authors contributed equally to this work. 


Kong et al., Science 375, 515-522 (2022) 


criminate eukaryotic 6mA from bacterial 6mA 
contamination (J6, 20). Unique metabolically 
generated stable isotope labeling can address 
this limitation of LC-MS/MS (17, 18); however, 
it can only be used in cultured cells. Anti-6mA 
antibody-based dot blotting is commonly used 
to estimate 6mA levels (4, 5, 7, 9-12), but it 
cannot rule out bacterial contamination. In 
addition, anti-6mA antibody-based DNA im- 
munoprecipitation sequencing (DIP-seq) is 
often used for 6mA mapping (7, 8, 10, 13, 21), 
but it can be confounded by 6mA-independent 
factors such as DNA secondary structures 
(20) and RNA contamination (15). Restriction 
enzyme-based 6mA analyses are constrained by 
their limited recognition motifs (4, 22). Single- 
molecule real-time (SMRT) sequencing (23) and 
nanopore sequencing (24) provide opportunities 
for directly mapping 6mA events (3, 25, 26), but 
the existing methods are mainly for mapping 
6mA in prokaryotes and protozoa with high 
6mA abundance (3, 14, 26-29). For eukaryotes 
with low 6mA abundance, these methods are 
prone to yield many false positive calls due to 
low sensitivity (14-16). 

The lack of a reliable technology that ac- 
curately quantifies 6mA/A levels in eukaryotic 
genomes motivated us to develop a method, 
named 6mASCOPE, for quantitative 6mA de- 
convolution (Fig. 1). The method, based on a 
short-insert SMRT library design (Fig. 1A), 
examines all DNA molecules sequenced in a 
genomic DNA (gDNA) sample, separates the 
total sequences into different sources, and quan- 
titatively deconvolves the total 6mA events 
into each of the sources (Fig. 1B). We first 
validated our method over a wide range of 
6mA/A levels, from 10~° to 107', and then 
examined a number of eukaryotes. 
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A method for quantitative 6mA deconvolution 
Existing SMRT sequencing-based methods 
for modification detection require a reference 
genome, as they compare the interpulse dura- 
tion (IPD) associated with a base of interest 
in the native DNA to the expected IPD value 
estimated according to the base and its flank- 
ing DNA sequence in the provided reference 
genome (25, 29, 30). Within this design, only 
those sequencing reads that map to the 
provided reference genome are analyzed for 
6mA, ignoring potential bacterial contam- 
ination, which is known to have abundant 
6mA events. 

To help solve this problem, we took a meta- 
genomic approach. First, in contrast to ex- 
isting methods that depend on a reference 
genome for IPD analysis, we took a reference- 
free approach by using the circular consensus 
sequence (CCS, a feature of SMRT sequencing 
for error correction) of an individual DNA 
molecule as its molecule-specific reference 
for IPD analysis (23, 25, 31) (Fig. 1A), thus 
examining all the sequenced genetic contents 
for 6mA analysis. We designed relatively short 
SMRT insert libraries of 200 to 400 base pairs 
(fig. SIA) (37) so that each DNA molecule could 
be sequenced for a large number of passes 
(mean, 272x; median, 181x; Fig. 1A and fig. 
S1B), which facilitated a CCS base calling ac- 
curacy of >99.84% (Phred score 28; fig. S2) (37) 
and enabled reliable IPD analysis on single 
molecules (Fig. 2, A and B). We then used a 
metagenomic approach to map the CCS reads 
to a comprehensive collection of genomes (37) 
and performed 6mA quantification (described 
below) separately for each subgroup of genetic 
contents in a gDNA sample: species of interest, 
genomic regions of interest, and sources of 
contamination. 

The current standard method to detect 6mA 
from SMRT sequencing is based on a defined 
cutoff on a modification quality value (QV; 
essentially a transformed P value) (3, 28, 31, 32). 
Because QV varies markedly over sequencing 
depth or number of CCS passes on individual 
molecules (Fig. 2C) (28, 30), a fixed cutoff can 
create false positive 6mA calls, especially from 
genomic regions with high sequencing depth 
(e.g., mitochondrial genomes). We built on a 
critical observation of linear increase (slope 
~1.7 for 6mA events) of QV over CCS passes 
(better separation from nonmethylated adenines 
at higher coverages; Fig. 2, C and D) and 
developed a machine learning model for 6mA 
quantification from QV values calculated in 
the reference-free single-molecule IPD anal- 
ysis. The core idea was to train the machine 
learning model across a wide range of 6mA/ 
A levels (training datasets described below) 
and to use the model to predict 6mA/A levels 
of newly sequenced gDNA samples based on 
the collective QV distribution instead of an 
arbitrary QV cutoff (Fig. 2D) (37). 


lofs8 


RESEARCH | RESEARCH ARTICLE 


A 


C—O > 


Short insert 


High number of passes (subreads) 
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Fig. 1. Overview of 6mASCOPE for quantitative 6mA deconvolution. 

(A) Reference-free 6mA analysis of single molecules. Each molecule (short 
insert) is sequenced for a large number of passes (subreads). The subreads 
are combined to a circular consensus sequence (CCS), serving as the molecule- 
specific reference for in silico IPD estimation, and they provide repeated 
measures of IPD values for 6mA analysis (31). Blue segment denotes SMRT 


We constructed high-quality benchmark 
datasets for the machine learning model 
training. For 6mA negative controls, we used 
HEK-WGA [whole-genome amplification of 
human embryonic kidney (HEK)-293 cell gDNA, 
6mA/A level < 10°° by ultrahigh-performance 
liquid chromatography-tandem mass spec- 
trometry (UHPLC-MS/MS)], HEK293 (native 
gDNA, 6m4A/A level <10~° by UHPLC-MS/MS), 
and HEK-WGA-MsssI (CpG sites in vitro 
methylated using a 5mC methyltransferase, 
MsssI), with the latter two representing the 
influence of 5mC events on IPD (16, 25, 31). 
These samples were each methylated in vitro 
using three bacterial 6mA methyltransferases 
(Dam, GATC; TaqI, TCGA; and EcoRI, GAATTC) 
to create three positive controls: HEK-WGA-3M, 
HEK293-3M, HEK-WGA-MsssI-3M (fig. $3). 
By mixing negative and positive controls in 
silico at different ratios, we created a wide 
range of 6mA/A levels (107! to 10~°) for the 
model training (Fig. 2E) (37). Using leave-one- 
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The circular consensus sequence (CCS) of an individual DNA molecule 
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out cross-validation, we compared several models 
(fig. S4) and selected Random Forest. Our 
model showed reliable quantification of 6mA/ 
A levels with defined 95% confidence intervals 
(CIs; Fig. 2F and fig. S5) (37). CI depends on 
both 6mA/A level and number of CCS reads 
(Fig. 2F and fig. S5B) (31), which facilitated 
dataset-specific CI estimation along with 6mA 
quantification. 

In contrast to existing methods (table S1), 
6mASCOPE takes a metagenomic approach 
and specifically quantifies 6mA events in 
eukaryotic genomes over contamination, 
because CCS reads, grouped by species (or 
specific genomic regions), are separately quan- 
tified for 6mA/A levels. For validation, we 
applied 6mASCOPE on a series of in vitro 
mixed E. coli, Helicobacter pylori, and Saccha- 
romyces cerevisiae samples with a wide range 
of 6mA/A levels (10° to 10°° by UHPLC-MS/ 
MS) and found that 6mASCOPE reliably de- 
convolved different sources into expected 
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adapter. (B) After single-molecule 6mA analysis (a red dot indicates a 6mA 
event), CCSs (black rods) from a sequenced gDNA sample are separated 

into the eukaryotic genome (green) and contamination sources (blue and yellow). 
The 6mAZA levels of each species (or genomic region) are estimated using a 
machine learning model trained across a wide range of 6mA abundance, with 
defined confidence intervals. 


ratios along with stable 6mA quantification 
(fig. S6). 


High-resolution insights of 6mA deposition in 
two protozoans 


Although previous studies reported enrichment 
of 6mA events in the linkers near transcrip- 
tion start sites (TSSs) in two protozoans, 
C. reinhardtii and T. thermophila (4, 5), it 
remains unclear which specific regions within 
the linkers are enriched for 6mA events. 
We sequenced both organisms using the 
SMRT method and obtained 862,205 and 
975,050 CCS reads, respectively, for single- 
molecule 6mA analysis (table S2) (37). We first 
verified that 6mA has a periodic pattern in- 
versely correlated with nucleosomes near TSSs 
(fig. S7) (31. Next, by dividing genomic re- 
gions between the nucleosome dyad and 
the middle of each nucleosome linker into 
10 bins (31) and quantifying 6mA/A levels in 
each bin using 6mASCOPE, we found that 
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Fig. 2. 6mASCOPE method evaluation. (A) IPD ratios (the mean IPD in the 
native sample divided by the IPD expected from the in silico model) on illustrative 
molecules from E. coli wild-type strain K12 MG1655 and 6mA-free strain ER3413. 
Blue segment denotes SMRT adapter. (B) IPD ratio of adenines on GATC motif in 
E. coli K12 MG1655 and ER3413. 6mA events have IPD ratios of ~5; nonmethylated 
adenines have IPD ratios of ~1. (C) Modification quality values (QVs) of 6mA linearly 
deviate from the nonmethylated adenines (slope ~1.7), with better separation at 
high numbers of CCS passes. For illustration, kernel density estimation of adenines 
with QC > 50 is shown. Left: 6mA in GATC, GCACNNNNNNGTT, and AACNNNNNNTGC 
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Predicted 6mA/A level (log10) 


from E. coli Kl2 MG1655. Right: Nonmethylated adenines in E. coli ER3413. (D) QV 
distribution varies across different 6mA/A levels. (E) Feature vectors used for 
machine learning model training. In each row, one of 51 6mA/A levels (107 to 10°) is 
constructed by mixing negative and positive controls in silico at different 
ratios. Each column represents the percentage (averaged across 300 replicates, 
logio-transformed) of adenines over a number of slopes across CCS pass numbers 
20 to 240, divided into 11 bins (31). (F) For each 6mA quantification (x axis), 
6mASCOPE also provides the 95% confidence interval (y axis) (31). Colors 
represent the number of CCS reads used for 6mA quantification. 
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6mA was enriched at the nucleosome-linker 
boundaries in C. reinhardtii (Fig. 3, Aand D) 
instead of at the middle of the linkers, as 
previously reported. In contrast, 6mA/A levels 
of T. thermophila increased from the nucleo- 
some boundaries to the middle of linkers 
(Fig. 3, A and E, and fig. S8). We further used 
6mASCOPE to examine the enrichment of 
6mA across different motifs. For C. reinhardtii, 


we confirmed that 6mA is enriched in the 
VATB motif (Fig. 3B; V = A, C, or G; B = C, G, 
or T) and is essentially absent in non-VATB 
motifs; for T. thermophila, although 6mA was 
reported to be enriched across the NATN 
motif (5), our 6mASCOPE analysis revealed 
that VATB sites have a higher 6mA/A level 
than TATN and NATA sites by a factor of 2 to 


3 (Fig. 3C). 


6mA from commensal bacteria contribute to 
most 6mA events in insect and plant samples 
A previous study quantified 6mA in D. 
melanogaster using UHPLC-MS/MS and 
reported that 6mA/A reaches the peak level 
of ~700 ppm (parts per million) in ~0.75-hour 
embryos and falls to ~10 ppm at later stages 
such as adult tissues (8). We first collected 
the fly embryo sample at ~0.75 hours and got 
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Fig. 3. 6mASCOPE reveals high-resolution 6mA deposition in C. reinhardtii 
and T. thermophila. (A) 6mA deposition relative to nucleosomes and linkers 

in C. reinhardtii and T. thermophila. Genomic regions between the nucleosome 
dyad and the linker center are divided into 10 bins (x axis) across the genome. 
The 6mAZA level (y axis) was quantified with 6mMASCOPE. Error bars are 

95% Cls. (B) 6mA is enriched in the VATB motif at nucleosome-linker boundaries 
in C. reinhardtii. Adenines in each bin are divided into three groups: VATB, 
TATN/NATA, and others. The dashed line indicates the trend of 6mA/A levels 
from nucleosome dyad to linker center; x and y axes are the same as in (A). Error 
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bars are 95% Cls. (C) 6mA is enriched across the NATN motif at linkers 

in T. thermophila. (D and E) Illustrative examples of 6mA enrichment in 

C. reinhardtii (D) and T. thermophila (E). Nucleosome occupancy (green stack) 
is based on MNase-seq data (31). Nucleosomes (green lines) and dyads 
(green dots) are determined by iNPS (v1.2.2). SMRT CCS reads (Mi) are 
shown with red (forward strand) and blue (reverse strand) lines. IPD ratios 

of 3 or higher are shown. (F) Schematic of 6mA enrichment at the 
nucleosome-linker boundaries in C. reinhardtii and the gradual 6mA increase 
from nucleosome boundaries to linker centers in T. thermophila. 
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674,650 SMRT CCS reads for single-molecule 
6mA analysis (table S2). Despite strict mea- 
sures to avoid contamination (37), we found 
that 96.12% of the CCS reads mapped to the 
D. melanogaster genome reference, whereas 
3.88% of the CCS reads mapped to a few 
microbes (Fig. 4A). Specifically, the contami- 
nation reads came from S. cerevisiae (1.65%), 
the major food source of Drosophila (33), and 
two genera of bacteria, Acetobacter (0.86%) 
and Lactobacillus (0.23%), the main gut com- 
mensal bacteria of D. melanogaster (34). We 
separately quantified 6mA/A levels in the 
D. melanogaster genome and in each con- 
tamination source and found that the level of 
6mA/A in total gDNA was 100 ppm (CI, 50 
to 200 ppm, consistent with the ~121 ppm 
UHPLC-MS/MS estimate), 2 ppm in D. melano- 
gaster (CI, 1 to 10 ppm), 2 ppm in Saccharomyces 
(CI, 1 to 10 ppm), 5495 ppm in Acetobacter (CI, 
3162 to 10,000 ppm), 977 ppm in Lactobacillus 
(CI, 501 to 1995 ppm), and 7413 ppm in Others 
(including additional bacterial genera and 
unannotated sequences; CI, 3981 to 12,589 
ppm) (Fig. 4B and fig. S9) (37). Despite their 
relatively low abundance (3.88%), bacteria con- 
tributed to most of the 6mA events in the total 
gDNA (Fig. 4C). In Acetobacter, we observed a 
high-confidence bacterial 6mA motif (GANTC) 
(Fig. 4B), consistent with the REBASE database 
(35). The 6mA/A level of 2 ppm (CI, 1 to 10 ppm) 
estimated for D. melanogaster, in contrast to the 
~700 ppm previously reported, only explains 
144% of the total 6mA events in the gDNA sam- 
ple (considering taxonomy abundances; Fig. 4C). 

We next applied 6mASCOPE to examine a 
D. melanogaster adult sample (whole animal), 
which showed very different microbiome com- 
position with extremely low bacteria contam- 
ination, yet still no evidence of a high 6mA/A 
level in Drosophila (fig. S10). We also rean- 
alyzed the 6mA DIP-seq data from a previous 
D. melanogaster study (8) and found reads 
that mapped to multiple bacterial genomes. 
It is also worth noting that N*-methylcytosine 
(4mC), another form of DNA methylation 
prevalent in bacteria, was also detected in 
CCS reads from Acetobacter enriched at GTAC 
sites (fig. S11), a motif previously reported in 
Acetobacter (35). This observation shows that 
4mC analysis for eukaryotic organisms also 
should be cautiously examined for possible 
bacterial contamination. 

In addition to insects, we hypothesized that 
soil bacteria can confound 6mA analysis in 
plants. We applied 6mASCOPE to A. thaliana 
21-day-old seedlings (37), which were reported 
as having ~2500 ppm 6mA/A by LC-MS/MS 
(9). Among the total 535,030 SMRT CCS reads 
for single-molecule 6mA analysis, 98.52% could 
be mapped to the A. thaliana genome (Fig. 4D). 
Among the other 1.48% (subgroup Others), 
24.12% were annotated and classified (using 
Kraken2) into several phyla: Proteobacteria 
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Fig. 4. 6mASCOPE analyses show that commensal bacteria contribute to the vast majority 

of 6mA events in insect and plant samples. (A) Taxonomic compositions (percent) in the D. melanogaster 
embryo ~0.75-hour gDNA sample. CCS reads mapped to Acetobacter or Lactobacillus are summarized 
by genus. (B) 6mA quantification of the D. melanogaster genome and contaminations. For each 
subgroup, 6mAZA levels are quantified by 6mASCOPE (error bars are 95% Cls). QV distributions are 
shown at bottom (colored dots refer to species/genus colors in main panel). 6mAZA level of S. cerevisiae 
is further examined with additional sequencing (fig. S9). CCS reads from Acetobacter, Lactobacillus, 
and Others (e.g., low-abundant bacteria) are grouped together because CCS read counts within each 
subgroup are low; Cls are defined on the basis of 8000 CCS reads. Arrow denotes the density of 

IPD ratios in the GANTC motif in Acetobacter. (C) 6mA contribution (percent) from each subgroup 

in the D. melanogaster embryo sample. (D and E) Taxonomic compositions (percent) in the A. thaliana 
21-day seedling gDNA sample. The CCS reads in subgroup “Others” (D) are classified with Kraken2. 
Main classes of Proteobacteria are shown in fig. S12. (F) 6mA quantification of the A. thaliana genome 
and the contamination (Others). (G) 6mA contribution (percent) from each subgroup in the A. thaliana 
seedling sample. 
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Fig. 5. 6mASCOPE-based quantitative deconvolution across multiple 
human gDNA samples. (A) 6mA/A levels on the genome of interest 
quantified by 6mMASCOPE (error bars are 95% Cls). The 6mA/A level in 

S. cerevisiae is consistent with independent UHPLC-MS/MS measurement 
(0.3 ppm, lower than the minimum 6mA/A level used in the 6mASCOPE 
training dataset). Except for D. melanogaster embryo and A. thaliana gDNA 
samples (both are contaminated by bacteria), 6mA/A levels by 6mMASCOPE 
are consistent with UHPLC-MS/MS (red cross). For all samples except 
HEK-WGA-3M and HEK293-dam, the UHPLC-MS/MS is performed indepen- 


dently using the same batch of gDNA samples. For HEK-WGA-3M and 
HEK293-dam, the UHPLC-MS/MS estimates are mimicked: Nearly all the 
expected motif(s) are methylated in vitro by the methyltransferase(s). The QV 
distribution for each gDNA sample is shown at the top. (B) Sources (percent) 
of CCS reads in the HEK-pCl sample (transfection of an empty pCl plasmid 
into HEK 293 cells). (©) 6mA quantification (percent) of different sources in 
HEK-pCl. CCS reads from E. coli and Others are grouped together, and their Cls 
are determined on the basis of 8000 CCS reads. (D) 6mA contribution 
(percent) from the subgroups in the HEK-pCl sample. 


(fig. S12), Actinobacteria, Bacteroidetes, and 
Firmicutes. These phyla and classes (Fig. 4E 
and fig. S12) are consistent with A. thaliana 
root microbiome (36). Using 6mASCOPE, 
we separately quantified 6mA/A levels for 
A. thaliana (3 ppm; CI, 1 to 10 ppm) and Others 
(3981 ppm; CI, 1995 to 7943 ppm) and found 
that CCS reads mapped to A. thaliana con- 
tributed to only 4.21% of the total 6mA events 
in the total gDNA sample (Fig. 4, F and G). 
Consistently, 6mASCOPE analysis of the 
A. thaliana 21-day-old root sample also dem- 
onstrated remarkable microbiome contamina- 
tion (greater than the seedlings), with a smaller 
contribution from A. thaliana to the total 6mA 
events (fig. S13). 


6mASCOPE finds no evidence of high abundance 
of 6mA in the human cells examined 


We next examined the abundance of 6mA in 
human cells and tissues. We chose to investigate 
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peripheral blood mononuclear cells (PBMCs), 
which are composed of 70 to 90% lymphocytes 
(37), because lymphocytes have been shown to 
have a high 6mA/A level of ~0.051% (510 ppm) 
(12). We also collected and examined two 
glioblastoma brain tissue samples because 
glioblastoma stem cells and primary glioblas- 
toma were reported to have a 6mA/A level of 
~1000 ppm by dot blotting and mass spec- 
trometry (2). 

We obtained 570,283, 247,700, and 280,763 
SMRT CCS reads from the PBMC sample and 
the two glioblastoma brain tissues, respectively, 
for single-molecule 6mA analysis. Of these, 
99.53%, 99.88%, and 99.86% of CCS reads 
were mapped to the human reference genome, 
indicating highly pure samples. The 6mA/A 
levels estimated by 6mASCOPE in glioblas- 
toma samples were ~10~°, with 3 ppm for 
glioblastoma-1 (CI, 1 to 16 ppm) and 2 ppm for 
glioblastoma-2 (CI, 1 to 13 ppm) (Fig. 5A) (32). 


4 February 2022 


This level is comparable to the negative con- 
trols with extremely low 6mA/A levels: HEK- 
WGA (1 ppm; CI, 1 to 6 ppm) and native HEK293 
(1 ppm; CI, 1 to 6 ppm), when the confidence 
intervals are taken into consideration. In the 
PBMC sample, the 6mA/A level estimation 
of 17 ppm (CI, 4 to 63 ppm) by 6mASCOPE is 
consistent with the measurements of UHPLC- 
MS/MS (Fig. 5A). These data suggested 
either that the abundance of 6mA, if present in 
glioblastoma and PBMCs, was much lower 
than the reported levels in the recent studies 
(glioblastoma, ~1000 ppm; lymphocytes, 
~510 ppm) or that 6mA/A levels may be highly 
heterogeneous or variable between different 
samples of the same cell type, the same tis- 
sue, or a specific disease. Motif enrichment 
analysis did not support a reliable motif in 
these samples (fig. S14). 

Across all the samples examined in this 
study, we observed largely consistent 6mA/A 
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level estimates between 6mASCOPE and UHPLC- 
MS/MS (Fig. 5A) except the D. melanogaster 
embryo and A. thaliana samples, for which 
the much higher 6mA/A estimates by UHPLC- 
MS/MS were due to bacterial contamination 
(Fig. 4), highlighting the capability and reli- 
ability of 6GmASCOPE. In addition to 6mA quan- 
tification of individual species, our method was 
also able to quantify 6mA/A levels in specific 
genomic regions of interest. Previous studies 
have reported enrichment of 6mA in mito- 
chondrial DNA (mtDNA) (72, 13, 27, 38) and 
in young full-length LINE-1 elements (L1s) 
(10, 11, 21). For mtDNA, 6mASCOPE did not 
find 6mA enrichment in the 7205 CCS reads 
from the HEK293 sample that mapped to 
mtDNA, in comparison to a negative control 
(targeted mitochondrial genome amplifica- 
tion, 10°”; CI, 10° to 10 *; fig. $15). For L1 
elements, although 6mASCOPE appeared to 
suggest a higher 6mA/A level in the young 
full-length Lis than in older Lis, a further 
comparison with a WGA negative control did 
not support 6mA enrichment in young L1 
elements (fig. S16), highlighting the impor- 
tance of using negative controls to capture 
possible uncharacterized biases (14, 39). This 
result was consistent with our previous study 
of human lymphoblastoid cells, in which in- 
creased IPD patterns exist not only in adenines 
but also in cytosines, guanines, and thymines of 
young L1 elements, which suggested confounding 
factors such as secondary structure (74). 


Plasmids used for genetic manipulation can 
carry confounding bacterial-origin 6mA 


Genetic manipulation is commonly used in 
epigenetic research to characterize putative 
methyltransferases and demethylases. E. coli 
is often used as a host for plasmid selection 
and expansion. As a result, the plasmids can 
contain 6mA events written by bacterial 
methyltransferase(s) and can confound 6mA 
study in eukaryotic cells. 

To illustrate this, we transfected an empty 
pCI plasmid vector from E. coli into HEK293 
cells, following the standard lipofection-based 
protocol (32). Total gDNA harvested at 72 hours 
after transfection was sequenced using SMRT 
technology and analyzed using 6mASCOPE. 
Among the 741,558 CCS reads, 95.99% were 
mapped to the human genome and 3.75% 
came from the pCI vector (Fig. 5B), and the 
remaining 0.26% of CCS reads (Fig. 5B) in- 
cluded reads that mapped to the E. coli genome 
(3D, implying possible carryover of gDNA from 
E. coli to the HEK293 cells during transfection. 
By separately quantifying the 6mA/A level in 
each subgroup, pCI showed a high 6mA/A level 
of 10*® (25,119 ppm), about the same as E. coli 
(Fig. 5C). Considering its abundance, pCI con- 
tributed to 93.91% of the total 6mA events in 
this post-transfection HEK293 total gDNA 
(Fig. 5, C and D). Hence, genetic manipulation 
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experiments involving plasmids may con- 
found the characterization of putative 6mA 
methyltransferases and demethylases. Although 
the use of methylation-free bacteria as the host 
for plasmid preparation can avoid this type of 
contamination, it is worth noting that the Dam 
methyltransferase mutant E.coli, previously 
used in a few studies (7, 38), still has sub- 
stantial 6mA events because of the remaining 
6mA methyltransferase hsdM (2, 28) (fig. S17, 
based on 6mASCOPE analysis). We therefore 
suggest the use of FE. coli strains with both 
Dam and hsdM deleted as the plasmid host. 


Discussion 


Our study cannot exclude the potential pres- 
ence of authentically high levels of 6mA/A in 
multicellular eukaryotes in certain samples 
that we did not examine here. However, our 
results suggest that a reassessment of 6mA 
across eukaryotic genomes, using 6mASCOPE 
to quantitatively estimate the confounding 
impact of bacterial contamination, is warranted. 
To facilitate the broad use of 6mASCOPE, we 
have released a detailed experimental protocol 
and an automated software package on Zenodo 
(40) and GitHub. 

We caution that plasmid 6mA contamina- 
tion, even from Dam methyltransferase mutant 
E. coli, is possible during genetic manipulation 
and may have confounded previous charac- 
terizations of 6mA enzymes. Lipofection or 
electroporation, which is used to transfect 
plasmid DNA directly into the target cells, is 
more likely to introduce contamination, whereas 
lentiviral transduction would be less affected if 
the original plasmids are completely removed 
during viral packaging. 

Our 4mC result suggests that similar cau- 
tion should be exercised when studying 4mC 
in eukaryotes by means of SMRT sequencing, 
which has found 4mC in several eukaryotes 
[see (41], despite SMRT sequencing being 
prone to making false positive calls (16), es- 
pecially given the lack of evidence for 4mC in 
mice even when ultrasensitive UHPLC-MS/MS 
is used (19). More broadly, this study will also 
help to guide rigorous technological develop- 
ment for the detection of other forms of rare 
DNA and RNA modifications. 

Our study has a few limitations: (i) The 
focus of 6mASCOPE is more about quantita- 
tively deconvolving the global 6mA/A level 
into different species and genomic regions of 
interests, rather than mapping specific 6mA 
events in a particular genome. We prioritized 
this focus because the most controversial 6mA 
findings to date were those reporting high 
6mA/A levels in multicellular eukaryotes. The 
precise mapping of specific 6mA events in a 
particular genome would require deeper SMRT 
sequencing and can be pursued in future work. 
Gi) For reliable data interpretation, it is impor- 
tant to combine the 6mA/A levels estimated 
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by 6mASCOPE with their confidence intervals, 
which depend on sequencing depth. How- 
ever, even with a large number of CCS reads, 
6mASCOPE does not precisely differentiate 
6mA/A levels below 10 ppm because the con- 
fidence interval includes 1 ppm, which is the 
lowest 6mA/A level in our training dataset 
(Fig. 2F) (37). (iii) Two recent studies reported 
that ribo-m6A on mRNA can be a source of 
6mA on DNA via the nucleotide-salvage pathway 
(7, 18). 6mA events that are misincorporated via 
this pathway cannot be distinguished from other 
6mA events by SMRT sequencing or 6mASCOPE, 
and isotope labeling coupled with LC/MS-MS 
is needed instead (77). (iv) For each gDNA sam- 
ple, the CCS reads analyzed by 6mASCOPE only 
represent the DNA molecules that were se- 
quenced by SMRT sequencing. Although SMRT 
DNA polymerases can effectively sequence 
through diverse genomic regions with very 
complex secondary structures (42), it might 
miss some DNA molecules with certain unknown 
properties. (v) Although 6mASCOPE enables 
quantitative 6mA deconvolution, it could be 
confounded by other DNA modifications that 
indirectly influence SMRT DNA polymerase 
kinetics of adenines or flanking bases (3, 25, 30), 
sO we suggest combining LC/MS-MS and 
6mASCOPE for 6mA quantification and de- 
convolution of eukaryotic gDNA samples. 
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Discovery of genomic loci of the human cerebral 
cortex using genetically informed brain atlases 
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To determine the impact of genetic variants on the brain, we used genetically informed brain atlases in 
genome-wide association studies of regional cortical surface area and thickness in 39,898 adults and 9136 
children. We uncovered 440 genome-wide significant loci in the discovery cohort and 800 from a post hoc 
combined meta-analysis. Loci in adulthood were largely captured in childhood, showing signatures of negative 
selection, and were linked to early neurodevelopment and pathways associated with neuropsychiatric risk. 
Opposing gradations of decreased surface area and increased thickness were associated with common 
inversion polymorphisms. Inferior frontal regions, encompassing Broca’s area, which is important for speech, 
were enriched for human-specific genomic elements. Thus, a mixed genetic landscape of conserved and 
human-specific features is concordant with brain hierarchy and morphogenetic gradients. 


arge-scale magnetic resonance imaging 

and genetics datasets have afforded the 

opportunity to discover common genetic 

variants contributing to the morphology 

of the human cortex. Studies in model 
organisms have revealed intricate genetic 
mechanisms underlying cortical area and thick- 
ness (i.e., laminar) patterning, although it has 
been challenging to define aspects of cortical 
development that are shared across mammals 
as opposed to those that are human-specific 
(1). Nevertheless, many studies have shown 
support for the radial unit hypothesis, which 
posits differential neurodevelopment programs 
shaping and regulating these two cortical mea- 
sures (2). 

Consistent with this, the ENIGMA (Enhanc- 
ing Neuroimaging Genetics through Meta- 
Analysis) Consortium’s genome-wide association 
study (GWAS) of the human cortex found many 
variants associated with surface area and 
thickness linked to neurodevelopmental pro- 
cesses during fetal development (3). Such 
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evidence for neurodevelopmental programming 
indicates the need to investigate these ques- 
tions at earlier ages, as previous cortical GWASs 
have almost exclusively been conducted in 
older adults. 

Cortical expansion and regional patterning 
are largely genetically determined (2); there- 
fore, we used data-driven genetically informed 
atlases in this study (4, 5), rather than atlases 
primarily determined by sulcal-gyral patterns. 
These genetically determined atlases capture 
patterns of hierarchical genetic similarity fol- 
lowing known developmental gradients that 
shape the cortex along their anterior-posterior 
(A-P) and dorsal-ventral (D-V) axes, includ- 
ing 12 surface area and 12 thickness regions 
(2, 4, 5), and increase discoverability of genetic 
variants underlying the cortex (6). 


Results 
Genetic variants underlying cortical thickness 
and area 


In our discovery UK Biobank (UKB) sample 
of 32,488 individuals (table S1), we found 440 
genome-wide significant [mixed linear model 
association tests (7), P< 5 x 10°] variants after 
clumping each phenotype separately in PLINK 
(8) [linkage disequilibrium (LD) R? = 0.1, 250 kb], 
where 305 and 88 regional genetic variants 
were associated with the 12 surface area 
phenotypes and the 12 cortical thickness pheno- 
types, respectively (Fig. 1 and tables S2 and S3). 
Twenty-seven genetic variants were signifi- 
cantly associated with total surface area and 
20 variants with mean cortical thickness (table 
$2). After correction for multiple comparisons, 
234 genetic variants remained significant (P< 
2.27 x 107°, 5 x 10°*/t., with t. = 22 being the 
effective number of independent traits). We 
performed subsequent functional analyses for 
the 393 regional variants. Single-nucleotide 
polymorphisms (SNPs) were mapped to genes 


4 February 2022 


on the basis of their genomic position with 
FUMA (9). Across all phenotypes, SNPs were 
significantly enriched for noncoding regions 
(44.0% enriched for intronic variants, 33.4% 
for intergenic, and 17.7% for noncoding in- 
tronic RNA; Fisher’s exact test, P < 0.05) 
(Fig. 1 and table S4). 


Replication and generalization 


Replication was performed on an admixed 
sample of 7410 individuals from UKB, includ- 
ing 2232 of European descent, using mixed 
linear model association (MLMA) analysis in 
GCTA (genome-wide complex trait analysis) 
(10). We modeled population structure using 
GENESIS (/1) to estimate principal compo- 
nents and kinship. Estimated genetic effects 
in the discovery dataset were correlated with 
those in the replication dataset, as indexed 
by significant beta correlations (ranging from 
a correlation coefficient, 7, of 0.66 to 0.95 after 
correcting for errors in the estimated SNP 
effects) (fig. S1), sign concordance rate (bino- 
mial test, P < 0.05), and proportion of variants 
replicated after multiple comparison correc- 
tion (12). 

MLMA and GENESIS were also used for 
generalization to data from 9136 individuals 
from the Adolescent Brain Cognitive Devel- 
opment (ABCD) Study (table S1), given the 
high degree of admixture and relatedness in 
this sample. Generalization to ABCD was quite 
high, as can be observed through significant 
beta correlations (7 range: 0.46 to 0.92) (fig. S2), 
sign concordance rate, and proportion of variants 
replicated after correction for multiple com- 
parisons (12). This suggests that the genetic 
architecture of the cortex found in adulthood 
is largely generalizable to earlier life stages 
of neurodevelopment, particularly for surface 
area. We also examined correspondence be- 
tween the two datasets by calculating genetic 
correlations with LD score regression (LDSC) 
for each region. Eighteen of 24 phenotypes 
were significantly genetically similar between 
ABCD and UKB (genetic correlation, 7,, range: 
0.38 to 1.21) (fig. $3). 

Given the evidence of comparable results, 
we ran a joint meta-analysis of the three sam- 
ples using METAL (12). After clumping each 
phenotype separately to obtain independent 
loci, the meta-analysis revealed 800 genome- 
wide significant regional loci, with 467 passing 
correction for multiple comparisons (table S5). 
Of 800 loci, 526 were found to be independent 
by merging hits from these 26 phenotypes into 
one file and clumping with PLINK (8) (R” = 
0.1, 250 kb). With the exception of one SNP, 
all had a nonsignificant heterogeneity P value 
(P>1x 10°) associated with Cochran’s Q 
statistic, suggesting comparability among 
samples. SNPs from the meta-analysis with 
a significant heterogeneity P < 1x 10~° are 
listed in table S6. 
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Comparison to previous cortical GWAS 

We used conditional and joint analysis (COJO) 
(13) to identify novel loci compared with the 
most recent GWAS of cortical architecture 
which identified 369 loci (3). Of these loci, 
206 were found to be independent by clump- 
ing all 70 phenotypes together (R” = 0.1, 250 kb). 
COJO revealed that 63.6% of our 393 regional 
variants remained genome-wide significant and 
thus are considered novel associated variants 
(72) (table S7). 


Assigning SNPs to genes and neuropsychiatric 
implications 
All SNPs in LD (R? > 0.6) with the 393 regional 


variants were mapped to genes using posi- 
tional, gene expression [expression quantitative 
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trait locus (eQTL)], and chromatin interaction 
information in FUMA SNP2GENE (9). This 
mapped our genetic variants to 915 genes 
(tables S8 and S9). MAGMA gene-based analyses 
yielded 575 significant genes (mean ¥” statis- 
tics, P < 2.6 x 107°) (table $10). According to 
the National Institutes of Health Genetics Home 
Reference, many significant genes are related to 
neurodevelopmental disorders (autism, epilepsy, 
microcephaly) or dementia (table S11). 
Further support for this conclusion was 
determined by investigating the shared genetic 
effects between our brain phenotypes and 
disorders by estimating genetic correlations 
through LDSC (fig. S4 and table S12). We found 
a significant association between global surface 
area and attention-deficit/hyperactivity disorder 


% _-2.dorsolateralprefrontal_area: 20 loci 


8.anteromedialtemporal_area: 19 loci 


1.motor_premotor_SMA_thickness: 4 loci 


(ADHD) after multiple comparison correction, 
as well as nominal significant associations 
[e.g., temporal area with schizophrenia and 
autism spectrum disorder (ASD)]. To examine 
putative causal association, we performed Men- 
delian randomization (J4) on global area and 
ADHD that showed the most significant 7, and 
we did not find evidence of causality. We also 
examined ASD, a neurodevelopmental disorder 
with early onset, and its relationship with an- 
teromedial temporal area indexed by a signif- 
icant 7,. We found a significant unidirectional 
causation (0, = —0.36, P = 9.5 x 10°), indi- 
cating that decreased anteromedial temporal 
area may cause ASD. These SNPs could be missed 
in classical GWAS of ASD, but nevertheless are 
important genetic factors in the pathogenesis 
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Fig. 1. Manhattan plots of genetic variants underlying surface area and cortical 
thickness. Results are shown separately for surface area (A) and cortical 
thickness (B). Numbers on brain atlases represent each brain region. Plots are 
color coded by brain atlas region. Number of significant genetic loci are listed 
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* 6.ventromedialoccipital_thickness: 6 loci 
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in Manhattan subplot titles, with the horizontal dashed line denoting genome- 
wide significance. Vertical bar charts show breakdown of genomic position of 
SNPs, with corresponding legend at the top of (A). ncRNA, noncoding RNA; 
UTR3, 3' untranslated region; UTR5, 5’ untranslated region. 
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Fig. 2. Genetic architecture of the cortex. Cortical phenotypes generally have low polygenicity, 
medium to high heritability, and are under strong negative selection. Vertical black lines on each 
plot are average reference lines for relevant estimates of commonly studied traits taken from (16). 
Numbering of regions follows labels in Fig. 1. SA, surface area; CT, cortical thickness; x, polygenicity; 


S, selection; h?, heritability. 


of this disorder through their contributions to 
anteromedial temporal morphology. 


Genetic architecture of the cortex 


Compared with other common complex traits, 
cortical phenotypes tend to have low polygenic- 
ity (proportion of genome-wide SNPs with non- 
null effects; range: 0.0038 to 0.040; area: n = 
0.0085 + 0.0011; thickness: m = 0.015 + 0.0039) 
and average-to-high SNP-based heritability 
(range: 0.14 to 0.37; area: h? = 0.27 + 0.012; 
thickness: h? = 0.20 + 0.011) (Fig. 2). Pedigree- 
based heritability for the UKB discovery sample 
(range: 0.31 to 0.95), calculated with multiple 
genetic relatedness matrices (75), and twin- 
based heritability approximated by Falconer’s 
formula from the ABCD sample (range: 0.39 
to 0.96) can be found in table S13. Negative 
selection signatures can be inferred from the 
relationship between minor allele frequency 
and effect size, quantified by the S parameter 
implemented in SBayesS (J6) (fig. S5). We 
found that loci associated with our cortical 
phenotypes may be under strong negative 
selection pressures (16) compared with pheno- 
types with similar levels of heritability and 
polygenicity (range: —0.99 to 0.045; area: S = 
-0.79 + 0.11; thickness: S = —0.72 + 0.18). It 
should be noted that z is slightly dependent 
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on sample size and thus should be inter- 
preted with caution. However, others have 
shown similar estimates of polygenicity for 
brain phenotypes (17). 


Partitioned heritability 


Different functional regions of the genome 
can contribute disproportionately to complex 
human traits. Thus, we applied stratified LDSC 
regression to partition heritability estimates 
of our 24 cortical phenotypes for 97 annota- 
tions from the baseline model (18, 19) (table 
$14), from which we focused on enriched an- 
notations where regression coefficients are 
significantly positive (z > 1.96, two-tailed P < 
0.05). We classified the annotations into three 
categories determined from conserved, devel- 
opmental, and regulatory genomic partitions. 
We found seven conserved annotations (found 
in primates and other mammals) to be signi- 
ficantly enriched after multiple comparison 
correction (P < 0.0025, where P < 0.05/t.) (Fig. 3A 
and table S15) across 16 cortical phenotypes, 
with notable enrichment for seven phyloge- 
netically conserved cortical regions (e.g., me- 
dial temporal lobe, motor and orbitofrontal 
regions) for the annotation “ancient sequence 
age human promoter.” This conserved pro- 
moter annotation reflects a genomic region 
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that is evidenced to have existed before the 
evolutionary split of marsupial and placental 
mammals (J8). 

Seven regions, mostly indexing surface area, 
were significantly enriched for developmental 
annotations of fetal deoxyribonuclease I 
(DNAse I) hypersensitive sites (DHSs), a marker 
of accessible chromatin (20), along with enrich- 
ment of 15 cortical phenotypes for 13 regulatory 
annotations (table S15). We performed an ad- 
ditional partitioned heritability analysis using 
differential methylation regions (DMRs) that 
were previously found to be associated with 
present-day humans compared to Neanderthal 
and Denisovan genomes (27). Perisylvian thick- 
ness was nominally enriched for present- 
day human DMRs (LDSC Jackknife test, P = 
0.03). By partitioning the genome into mean- 
ingful functional categories, we capture pat- 
terns of hierarchical brain organization with 
evolutionarily conserved (paralimbic, sensory 
motor) regions enriched for conserved and 
developmental annotations and association 
areas more strongly associated with regulatory 
annotations. 


Gene Ontology enrichment 


To elucidate the biological pathways associated 
with our discovered genetic variants, MAGMA- 
mapped genes were input into the Molecular 
Signatures Database to obtain Gene Ontology 
(GO) terms. Twenty-six GO terms, predominantly 
related to neurodevelopment, were significantly 
associated with our brain phenotypes after 
Bonferroni correction (Fig. 3B and table S16). 
Notable biological pathways included WNT/ 
beta-catenin, TCF, FGF, and hedgehog signaling, 
which are important for axis specification and 
areal identity (7). For higher-order association 
regions, the dorsolateral prefrontal cortex was 
linked to cortical tangential migration. 


Three-dimensional genetic characterization of 
the cortex 


To better understand the relationship between 
our cortical phenotypes, we computed pheno- 
typic and genetic correlation matrices using 
LDSC (Fig. 4A). Significant correspondence was 
observed between matrices (Mantel test: 7 = 
0.85, P = 0.001), suggesting substantial genetic 
influences on cortical patterning. Hierarchical 
clustering was applied to genetic correlations 
of area and thickness separately, revealing a 
clear separation in genetic architecture between 
A-P divisions in area and between D-V divisions 
in thickness (Fig. 4, A and B, and fig. S6). Re- 
gions anatomically closer to each other tended 
to be more correlated with each other. However, 
homologous regions in contralateral hemispheres 
had high genetic correlations despite their phys- 
ical distance (4) (table S18). 

Given the observed correlations, we sought 
to estimate the shared genetic effects across 
phenotypes with genomic structural equation 


3 of 7 


RESEARCH | 


RESEARCH ARTICLE 


Conserved_Mammal_GERP. 


Conserved_Primate_phastCons 


developmental 


Fig. 3. Partitioned heritability and Gene Ontology (GO) enrichment. 

(A) Heritability of cortical phenotypes is significantly enriched for conserved, 
developmental, and regulatory annotations. The river plot depicts mapping 
between significant annotations (18) and cortical phenotypes. Color coding of 
the river plot is based on —logio enrichment P values. (B) Significantly 


Fig. 4. Three-dimensional genetic characterization of the cortex. 

(A) Phenotypic and genotypic correlations between 24 regions, ordered by 
hierarchical clustering that shows A-P divisions for area and D-V divisions for 
thickness. Phenotypic correlations are in bottom left triangle, and genetic 
correlations are in the upper right. SMA, supplementary motor area. 
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(B) Pleiotropic SNP counts for each pair of regions, using the same ordering 
as in (A). Agonistic or same direction of effects are in the lower red triangle, 
antagonistic or opposing effects are in upper blue triangle. (C) Brain maps 

of standardized effects of each latent factor (Fl and F2) derived from genomic 
SEM on each brain region. See fig. S7 for detailed statistics. 
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Fig. 5. Enrichment of cell type-specific accessible chromatin sites and 
fine-mapping to regulatory regions of genes. (A) Heatmap of enrichment for 
cortical phenotypes and cell type-specific accessible chromatin peaks. 
Phenotypes also include three metabolic (blood glucose, body mass index, and 
blood pressure) and three cortical-related (multiple sclerosis, Alzheimer’s 
disease, and depression) controls. Vertical black line differentiates M1 cell types 
(left) from organoid developmental stages (right). Significant values are based on 
the bias-corrected enrichment statistic from g-chromVAR (12). (B) Mapped 


modeling (SEM) (Fig. 4C and figs. S7 and S8) 
(12, 22). We found that two-factor models fit 
our data well (comparative fit index of >0.98). 
The two latent factors recapitulated the A-P 
and D-V gradations of cortical patterning for 
area and thickness, respectively. The strongest 
association signals between the latent factors 
and variants reside in the 17q21.31 inversion 
region for area (P < 148 x 10°°°), and more 
widespread effects across the genome with 
notable peaks on chromosomes 3 and 17 for 
thickness (P < 3.39 x 10~””) (fig. S8). We further 
performed association testing of inversion 
polymorphisms on 17q21.31 with our cortical 
phenotypes (table S19). We found the inverted 
allele to be highly associated with overall surface 
area reductions, with stronger effects in poste- 
rior regions along the A-P gradient and a modest 
positive correlation with increasing thickness in 
ventral regions. The opposing effects on area 
and thickness may in part account for the ob- 
servation of a modest negative association be- 
tween area and thickness (“cortical stretching”) 
after accounting for total brain size (23). 

After extracting salient latent factors underly- 
ing multiple brain regions, we searched for 
pleiotropic loci between pairs of regions. We 
used COJO to map SNPs with potential pleio- 
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are outlined in table S23. 


tropic effects (i.e., that influence two regions), 
defined by the loci of region 7 that were no 
longer genome-wide significant when condi- 
tioned on the loci of region & (13). Using this 
approach, we found that 107 of our 393 loci 
had pleiotropic effects on two phenotypes (Fig. 
4B and table S20). Surface area of parietal and 
posterolateral temporal regions shared eight SNPs 
with antagonistic effects (i.e., increasing area 
of one region while decreasing area of the 
other); these regions show good correspon- 
dence between ABCD and UKB and are both 
enriched for fetal DNAse hypersensitive sites 
(Fig. 3A). Two of these antagonistic SNPs, 
1s10878269 and rs142166430, are intronic var- 
iants of methionine sulfoxide reductase B3 
(MSRB3), a gene that is important for protein 
repair and metabolism (24). 

We also noted antagonistic pleiotropic ef- 
fects of two SNPs, rs12676193 and rs6986885, 
in the 8p23.1 inversion polymorphism linked 
to motor-premotor area and perisylvian thick- 
ness (table S20). These SNPs were mapped to 
methionine sulfoxide reductase A CVSRA), a 
gene that is important for repair of oxidatively 
damaged proteins (25). Further, the 8p23.1 
region is considered to be a potential hub for 
neurodevelopmental and psychiatric disorders 
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genes and the regulatory region (blue, enhancer; red, promoter) of the causal 
SNPs carried forward by positively enriched M1 cell type-cortical phenotype 
pairs (z > 2.36, P < 0.01). Size of dot reflects probability of SNP being causal. 
Colors represent peak to gene coaccessibilities, where a score of 1 reflects a 
peak being in the gene's promoter region. (C) A selected pleiotropic SNP 
(rs2696555) influencing both orbitofrontal area and ventral frontal thickness, 
mapped to target genes on the basis of coaccessibility with M1. Cell types 


(26). Another notable SNP with pleiotropic 
effects was rs888812, with antagonistic ef- 
fects on precuneus and prefrontal area. This 
and other variants were mapped to NR2F1/ 
COUP-TFI, a transcription factor influencing 
A-P patterning of the cortex in development (J). 


Enrichment of cell type-specific accessible 
chromatin sites and fine-mapping to regulatory 
regions of genes 


To map putative causal genes for our genetic 
variants—motivated by observed enrichment 
of our phenotypes for regulatory genomic 
regions—we computed cell type-specific enrich- 
ment for our fine-mapped GWAS SNPs on the 
basis of high-resolution accessible chromatin 
sites drawn from human primary motor cortex 
(M1) (27) and cerebral organoid data (28) using 
g-chromVAR (fig. S9). To quantify enrichment, 
we computed the accessibility deviations as 
the expected number of feature counts per 
peak per cell type, weighted by the fine- 
mapped variant posterior probabilities. This 
revealed 11 significantly positively enriched 
cell type-phenotype pairs after Bonferroni cor- 
rection (z > 2.8, P < 0.0025) (Fig. 5A), including 
enrichment of the motor-premotor region for 


accessible chromatin sites in oligodendrocyte 
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precursor cells (OPCs). This result is particu- 
larly compelling given that OPCs give rise to 
mature oligodendrocytes which in turn mye- 
linate axons in the central nervous system, 
and the motor cortex is known to be a region 
rich in intracortical myelin content (29). In 
control analyses, no significant enrichment 
was found for metabolic traits, suggesting that 
this approach is specific to cortical phenotypes. 
This approach is further supported by the con- 
sistent finding of the significant Alzheimer’s- 
microglia pair (30). 

For each significant M1 cell type-phenotype 
pair from Fig. 5A, we identified putative causal 
genes from a locus’s genomic position relative 
to its gene targets and chromatin coaccessi- 
bility relationships (i.e., both the genomic 
locus and its gene target were simultaneously 
accessible). From the initial 25 target genes, 
five distal and two proximal genes remained 
(Fig. 5B) after filtering out genes with weak 
evidence of gene expression in the corresponding 
cell type (fig. S10 and table $21). 

We applied the same mapping approach to 
pleiotropic SNPs and found three SNPs that 
overlapped with the M1 accessible chromatin 
peaks (Fig. 5 and table S22). Notably, rs2696555, 
a SNP in the 17q21.31 inversion region, was 
associated with increases in orbitofrontal area 
and ventral frontal thickness and mapped to 
the promoter region of GRN, a granulin 
precursor that helps preserve neuronal survival, 
axonal outgrowth, and neuronal integrity 
through its impact on inflammatory processes 
in the brain (37). This SNP was also mapped at 
a distal putative enhancer site of FZD2, which 
encodes a Frizzled receptor within the WNT/ 
beta-catenin pathway and is expressed in 
cortical progenitor cells of the dorsal and 
ventral telencephalon of the developing brain 
(32). A schematic of how this single variant 
could influence area and thickness is depicted 
in Fig. 5C. 


Discussion 


This study advances understanding of the gen- 
etic architecture underlying the organization 
of the cerebral cortex and uniquely human 
traits. Our genetically informed atlases en- 
hanced discovery of significant loci compared 
with previous cortical GWAS with traditional 
nongenetic atlases (3, 6). The improved dis- 
covery is likely aided by the fact that our 
atlases conform to genetic cortical patterning 
(4, 5), thereby increasing discoverability and 
heritability, while also having lower polygenicity. 

Making use of two large cohorts of adults and 
children, we found that many genetic variants 
in our findings pinpoint genetic mechanisms 
influencing cortical patterning of the human 
brain in early development. Our data, partic- 
ularly findings with COUP-TFI, support the 
protomap hypothesis whereby genes hold spatial 
and temporal instructions to initiate a cortical 
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map by graded signaling from patterning cen- 
ters in early development (J, 2). Our results are 
consistent with reports of loss of COUP-TFI 
function leading to expansion of frontal motor 
areas at the expense of posterior sensory areas 
in the rodent brain (7), which is intriguing 
given the challenges in defining rodent-specific 
versus human-specific developmental mech- 
anisms. These variants are promising candi- 
dates for future functional experiments. 

We also uncovered latent factors describing 
our area phenotypes, suggesting genetic effects 
related to inversion polymorphisms. Recurrent 
inversions of genomic regions, such as 17q21.31 
identified here along with 8p.23, have occurred 
through primate evolution and show that the 
inverted orientation is the ancestral state. 
Specifically, both 17q.21.31 and 8p.23 inversions 
appear to have occurred independently within 
the Homo and Pan lineages (33, 34). 17q21.31 
inversion contains microtubule associated pro- 
tein tau (VAPT), a risk gene for neurodegener- 
ation (35). The inverted (minor) allele has 
been associated with lower susceptibility for 
Parkinson’s dementia but higher predisposition 
to developmental disorders (33). 

We linked several of our findings to the 
WNT/beta-catenin pathway, which regulates 
cortical size by controlling whether progen- 
itors continue to proliferate or exit the cell 
cycle to differentiate (36). Cell proliferation is 
thought to exponentially enlarge the progen- 
itor pool and the number of cortical columns, 
which results in expansion of cortical surface 
area and gyrification. On the other hand, cor- 
tical thickness is largely determined by cell dif- 
ferentiation and a linear production of neurons 
within each cortical column (2, 36). In addition 
to 17q21.31, our results revealed loci linked to 
various cortical regions in this pathway (e.g., 
WNT3, GSK3B), and their combined interactive 
effects may be differentially involved in shaping 
area and thickness. 

The brain is particularly vulnerable to in- 
sults (genetic and environmental) during sen- 
sitive periods of neurodevelopment, and changes 
during this time can have lasting impacts on 
the brain over the life span. This perspective 
helps situate our findings of predominantly 
negative selection acting on our cortical pheno- 
types (Fig. 2), which may be linked to conserved 
genomic loci and those enriched for neuro- 
psychiatric diseases (18, 19). Here we uncovered 
a putative causal relationship of reduced ante- 
romedial temporal area potentially giving rise to 
ASD. The medial temporal lobe has been linked 
to abnormal connectivity in some types of ASD 
and houses structures (e.g., amygdala, hippo- 
campus) important in regulating emotion and 
social behaviors (37). We also found this re- 
gion to be enriched for accessible chromatin 
sites in inhibitory neurons; thus, these findings 
may provide clues to the long-standing theory of 
excitatory-inhibitory imbalance in ASD (38). 
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Intriguingly, most of our phenotypes, es- 
pecially paralimbic and sensory motor regions, 
exhibited enriched heritability for conserved 
genomic partitions (Fig. 3A) including pro- 
moter regions, rather than enhancers, con- 
sistent with the idea that the former are more 
evolutionarily conserved (J8). However, we also 
identified brain regions that have evolved to 
support human-specific behaviors, such as 
language and communication. Differential 
methylation and human-specific SNPs in as- 
sociation with perisylvian thickness lead us 
to speculate that altered morphology of the 
perisylvian region, and potentially also motor- 
premotor regions, were important in the evolu- 
tion of speech articulation (39). 

Our results with genetically informed atlases 
demonstrate that human brain arealization and 
regionalization largely arise from phylogenet- 
ically conserved regions and multiple neuro- 
developmental programs, but that a select 
few regulatory features, some of which may 
be specific to modern-day humans, have had 
widespread downstream effects on brain mor- 
phology and may have given rise to human- 
specific traits and diseases. 
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QUANTUM GASES 


Second sound attenuation near quantum criticality 


Xi Li*?3+, Xiang Luot?*+, Shuai Wang"?, Ke Xie’”>, Xiang-Pei Liu’?*, Hui Hu*?, Yu-Ao Chen??:>*, 


Xing-Can Yao'?**, Jian-Wei Pan’?>* 


Second sound attenuation, a distinctive dissipative hydrodynamic phenomenon in a superfluid, is crucial for 
understanding superfluidity and elucidating critical phenomena. Here, we report the observation of second 
sound attenuation in a homogeneous Fermi gas of lithium-6 atoms at unitarity by performing Bragg spectroscopy 
with high energy resolution in the long-wavelength limit. We successfully obtained the temperature dependence 
of second sound diffusivity D2 and thermal conductivity «. Furthermore, we observed a sudden rise—a 
precursor of critical divergence—in both D2 and « at a temperature of about 0.95 superfluid transition 
temperature T,. This suggests that the unitary Fermi gas has a much larger critical region than does liquid helium. 
Our results pave the way for determining the universal critical scaling functions near quantum criticality. 


econd sound, an entropy wave predicted 

by the seminal two-fluid hydrodynamic 

theory (/, 2), directly couples to the super- 

fluid order parameter (3-5). In contrast 

to first sound, which is a density wave 
existing both below and above the superfluid 
transition, second sound propagates only in 
the superfluid phase. As a macroscopic man- 
ifestation of heat and momentum diffusion, 
second sound attenuation is characterized by 
several important transport coefficients (6-8), 
such as the shear viscosity n and the thermal 
conductivity «. In liquid helium, the measure- 
ment of second sound attenuation and the rel- 
ated thermal transport led to the establishment 
of dynamic scaling theory (3, 5, 9-11). How- 
ever, owing to the narrow critical region and 
limited controllability of liquid helium, a deeper 
understanding and quantitative account of crit- 
ical scaling functions remain elusive. Related 
issues arise in a wide range of strongly corre- 
lated materials such as high-temperature super- 
conductors, where the anomalous charge and 
energy transport near quantum criticality is 
not well understood (12, 13). 

Ultracold fermionic atoms in the strongly 
interacting limit, that is, the unitary Fermi gas 
(14), offer great promise for studying the sec- 
ond sound attenuation and elucidating the 
critical phenomena. First, as a consequence of 
scale invariance, the thermodynamic and dy- 
namic properties of the unitary Fermi gas are 
universal functions of the reduced tempera- 
ture T/T, (15-20). Here, the Fermi tempera- 
ture Tp =h’k;,/(2mkp) is determined by the 
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atomic mass m, the density n, and the Fermi 
wave numberky = (3n2n)}/ 3, ky and hf denote 
the Boltzmann and reduced Planck constants, 
respectively. Thus, the second sound diffusiv- 
ity D. and the thermal conductivity « at tem- 
perature T are similarly universal functions of 
T/T. Second, thanks to the unprecedented 
controllability (74), the critical region of the 
unitary Fermi gas can be precisely probed to 
investigate the critical transport behaviors. 
Over the past decades, great efforts have been 
devoted to probing the sound propagation and 
attenuation in the unitary Fermi gas. The sec- 
ond sound propagation has been observed in a 
highly elongated harmonic trap (27), but the 
attenuation remains undetermined because of 
the density inhomogeneity. The first sound at- 
tenuation has been recently measured by con- 
fining the unitary Fermi gas into a box potential, 
eliminating the inhomogeneity problem (22). 
However, observing the second sound attenu- 
ation is challenging because the signal is too 
weak to be resolved from noise. 

Here, we measured second sound attenua- 
tion in a homogeneous unitary Fermi gas of 
SLi atoms (23, 24) with extremely large Fermi 
energy by developing a Bragg spectroscopy 
technique with small wave number & and high 
energy resolution. We successfully determined 
the second sound diffusivity and the thermal 
conductivity of the unitary Fermi gas; the super- 
fluid fraction and the shear viscosity are also 
obtained with improved accuracy. In the super- 
fluid phase, D, and « attain the universal 
quantum values of /m and nhkpg/m, respec- 
tively. Near the superfluid transition, a sudden 
rise in Dy and x is observed, consistent with 
the critical divergence phenomena predicted 
by the dynamic scaling theory (3-5). We find a 
surprisingly large quantum critical region char- 
acterized by |¢| < 0.05, where the dimension- 
less temperature t=1— 7/7, measures the 
proximity to the superfluid transition tem- 
perature 7,. Our measurements accomplish a 
quantitative experimental examination of the 
dissipative two-fluid hydrodynamic theory for 
the strongly interacting Fermi gas. Furthermore, 


the observed universal transport coefficients can 
provide insight into the anomalous transport 
of strongly correlated materials such as the 
cuprates (13) and provide a benchmark for 
many-body theories (25). 


Experimental scheme and setup 


The measurement of first and second sound 
rests on the dissipative two-fluid hydrodynam- 
ic theory for the density response function at 
wave number # and frequency (6-8, 26): 


nk? 
Xnn(k, ) = mm 
wo? — wk? + iD; k’o 
(2 — c?k? + iD k?w) (w? — chk? + iD2k?o) 
(1) 


which is deduced from the conservation laws 
for momentum and energy. Here, ¢; (2) and D; 
(Do) are, respectively, the speed and diffusivity 
of first (second) sound, whereas v and D, are, 
respectively, the speed and diffusivity associ- 
ated with thermodynamic and transport prop- 
erties of the system (6-8, 26). In the superfluid 
phase, two propagating sound waves with at- 
tenuation or damping rate T’; = D;k? (i = 1, 2) 
can be clearly identified near the two poles 
@; = c;k in the response function, which can 
be expressed as i, ~Z;[(@ — c?k? + Tio) 
with weight Z;; above the superfluid transition, 
the critical second sound becomes a thermally 
diffusive mode x\:) ~ 1/(@ + iD2k*), and the 
diffusivity D. = «/(mncp) is fully character- 
ized by the thermal conductivity « and the 
specific heat at constant pressure Cp. 

It is notable that the simple form of Eq. 1 is 
applicable for quantum liquids with strong cor- 
relations (6), provided that & and are small 
in comparison with the inverse correlation 
length € and inverse collision time t!. How- 
ever, a careful experimental validation of this 
theory is difficult in liquid helium for two 
reasons: (i) The narrow critical region is diffi- 
cult to reach by Brillouin scattering (9), and (ii) 
the second sound weight Z determined by the 
Landau-Placzek ratio eyp is very small; here, 
€.p = Cp/Cy — 1, withcy being the specific heat 
at constant volume. In this work, we accom- 
plish this by developing a high-resolution Bragg 
spectroscopy technique with small & to probe 
the density response of a homogeneous unitary 
Fermi gas, in which Z, and e;p turn out to be 
sizable near the superfluid transition (26). 

In Bragg spectroscopy (27-29), the density 
response function links the density fluctuation 
dn(k, ) (i.e., the response of the system) to an 
applied weak potential perturbation 5V(A, w) 
via dn(Kk, ©) = Xnn(k, ©) SV (Kk, w). To achieve a 
good signal 6, we create a high-density spin 
mixture of °Li atoms, which are equally pop- 
ulated in the lowest two hyperfine states at 
832.18 G [i.e., the unitarity, where s-wave scat- 
tering length diverges (30)]. After using forced 
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Fig. 1. Creation of sound waves. (A) Sketch of the experimental setup. The rectangular-box trap consists 
of a square tube and two sheets of 532-nm laser beams, generated by two spatial light modulators. DM, 
dichroic mirror; CCD, charge-coupled device. (B) A pair of 741-nm laser beams with wave vectors (ky, kz) and 
frequencies (@), wz) exactly intersect on the homogeneous unitary Fermi gas with a small angle, producing 

a one-dimensional moving-lattice potential. The wave number k = |ky — ka| of the Bragg lattice can be accurately 
determined by the horizontal CCD camera in (A), whereas another vertical CCD camera is used to probe the 
in situ density profile of the cloud. (C and D) First and second sound waves in a unitary Fermi superfluid at 
0.84(2)T/T.. The top panels show typical single-shot difference images of density response &n, with the lattice 
frequency m = @ — @2 of 2x x 2.1 kHz for first sound (C) and 2x x 0.3 kHz for second sound (D) (see text). n(z) 
and 6n(z) are obtained, respectively, by further integrating the reference and difference images along the 
transverse direction in the dashed box (i.¢., region of interest 84.40 um by 41.39 um). The bottom panels show 
the normalized density waves 6n(z)/n(z) along the longitudinal z axis. The solid line is a guide to the eye. 


evaporative cooling in a crossed dipole trap, 
about 1 x 10” atoms close to T; are adiabatical- 
ly loaded into a 151 um-by-55 um-by 55 um 
rectangular-box trap (23, 24, 31). The box trap 
consists of a square tube and two sheets of 
532-nm laser beams, as depicted in Fig. 1A, 
and has a maximal potential depth of about 
2nh x 160 kHz. To prepare homogeneous Fermi 
superfluids at various T/T., where T, ~ 0.17Tp, 
we adiabatically lower the potential depth to 
different final values and hold the trap for an 
additional 500 ms to reach thermal equilibrium. 
We find that the density n and the reduced 
temperature T /T, decrease monotonically with 
the decreasing potential depth of the box trap. 
For a typical cloud at T/T, ~ 0.84, the realized 
density isn ~ 1.56 x 10" cm~?, the Fermi wave 
number is kp ~ 2m x 1.23 pm, and the Fermi 
energy is Ep ~ 20h x 50.1 kHz. Two important 
features of our system are worth mention- 
ing: (i) The density n decreases by only about 
8%, from 1.64 x 10” cm ® close to Tz to 1.50 x 
10"? cm™ at 0.75(2)Te. (ii) The 1/e lifetime, 
where e is Euler’s number, of the unitary Fermi 
superfluid is quite long, that is, more than 20 s, 
and the heating of the system is very weak. 
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This preparation of a homogeneous unitary 
Fermi gas with well-controlled temperature 
and extremely large Fermi energy makes the 
probe of the extremely weak second sound 
response possible (32). 

The Bragg lattice potential 5V(z,tg3) = 
Vosin(kz — wtg)O(tg) is engineered by apply- 
ing a pair of coherent 741-nm laser beams with 
a frequency difference w that intersect at the 
location of the gas (see Fig. 1B). Here, 2Vo is 
the potential depth, z is the longitudinal axis 
of the cloud, and ©(tg) is the Heaviside step 
function. The laser beams are carefully chosen 
to be far-off-resonant and to have a large beam 
diameter, which is pivotal for minimizing un- 
wanted heating during the perturbation and 
ensuring the uniformity of the Bragg lattice 
potential. It is known that the correlation 
length diverges as § ~ k;'|t|"’ near the super- 
fluid transition with the critical exponent 
v ~ 2/3 given by the F model (5). Experimentally, 
a small wave number k = 2n x 0.071 m7! = 
0.058 ks is applied by adjusting the inter- 
section angle between two lattice lasers. If we 
use the criterion é ~ 0.058|t|-?/? <1, the hy- 
drodynamic regime could be reached over a 


wide range of temperatures unless it is very 
close to T,, that is, |t| < 0.014. Arising from 
the Bragg lattice potential, the steady-state 
density response takes the form of 5n(z, tp) = 
IXnn(K, ©) | Vo sin[kz — wty + 0(k, w)], where 
IXnn(K, )| and (%,@) are the modulus and 
argument of x,,(K, ©), respectively. Experimen- 
tally, a carefully chosen Vo of about 0.5% Ey 
(1.51 x 10°" J) and perturbation duration of 
3 ms are implemented to satisfy the criteria 
of linear steady-state response. With these 
optimized parameters, the density response 
dn at wm can be acquired by subtracting two 
high-resolution in situ images, which are 
taken at the given w and Myer = 2m x 1 MHz, 
respectively, with the latter being the ref- 
erence. Figure 1, C and D, shows two distinct 
density waves 6n(z)/n(z) of the superfluid 
at T ~ 0.847, that is, first sound at w = 27 x 
2.1 kHz and second sound at w = 2x x 0.3 kHz, 
respectively (32). 

Two key technical advantages of our Bragg 
spectroscopy are worth noting (32): (i) The 
modulus |x,,.,(%,)| can be directly obtained 
from the integration of |6(z) /n(z)| as a func- 
tion of w so that we avoid potential errors 
owing to the imperfect phase synchronization 
for acquiring Im|y,,,,(, )| from out-of-phase 
density response. (ii) A steady-state density 
response is taken, and thus the finite pertur- 
bation duration does not lead to a spectrum 
broadening nor does it set a frequency reso- 
lution in our experiment. These two advan- 
tages, combined with the ability to prepare a 
homogeneous Fermi gas with extremely large 
Ey as mentioned earlier, enable us to measure 
the density response function y,,, with high 
signal-to-noise ratio. 


Density response spectra 


The density response spectra |x,.,(#, @)| over a 
wide temperature range of 0.42 < T/T, < 1.04 
were systematically measured. The spectra from 
0.75(2)T; to 1.04(2)T, are presented in Fig. 2, 
accompanied by the fittings to Eq. 1 (solid 
lines). A high-frequency first sound peak at 
ho ~ 0.046E, is clearly visible over the whole 
temperature range, varying smoothly across 
the superfluid transition. More importantly, 
a second sound peak can be unambiguously 
identified at low frequency iim < 0.01£y. This 
is particularly evident in the temperature range 
of 0.79 < T/T. < 0.94, as highlighted in the left 
panel of Fig. 2A. There is a notable change 
in the second sound peak from 0.98(2)T. to 
1.01(2)T.: The line shape becomes diffusive 
and shoulder-like with much reduced height 
(see the left panel of Fig. 2B), indicating that 
second sound is the critical mode character- 
izing the superfluid transition. Two intriguing 
features of the spectra are worth mention- 
ing: (i) |¥nn(K, @)|is nonzero at m = 0, which 
is contributed by the real part of y,,,(, ©) 
and agrees with the compressibility sum rule 
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Fig. 2. Cascade plots of density response spectra at various temperatures. (A) The spectra from 0.75(2)T; to 0.94(2)T, (top to bottom) are shown on the right. 
The two subplots on the left give a zoomed-in view of the low-frequency second sound response at 0.84(2)T, and 0.94(2)T7¢, respectively. (B) The spectra from 

0.97(2)T. to 1.04(2)T, (top to bottom), with the second sound response highlighted on the left. Every data point corresponds to an average value of about 30 to 
50 independent results, each obtained from a measured single-shot density wave similar to the one shown in Fig. 1, C or D (32). The error bars represent one standard 


deviation. The solid lines are the fitting curves, obtain 


(6), Xnn(k, © = 0) = —n/(mvz). Here, ons) = 
(OP/On)-(s)/mis the isothermal (adiabatic) 


sound speed. (ii) The coupling between first 
and second sound is appreciable, and thus 
the sound peaks do not show the symmetric 
Lorentzian line shape expected for prop- 
agating sound. To recognize the respective 
contributions of the first and second sound 
responses, the imaginary parts of y,,.(K, @) 
and (2. (k, @) are reconstructed using the fit- 
ting results of sound speed and diffusivity. A 
well-defined propagating second sound with a 
Lorentzian line shape is then observed for tem- 
peratures up to a threshold value of 0.98(2)T, 
(see fig. S10). 

The threshold temperature 0.98(2)T; is con- 
sistent with T < 0.986T., an estimate based 
on the hydrodynamic criterion k& < 1. The 
confirmation of two-fluid hydrodynamics is 
also supported by the excellent curve fittings in 
the temperature range of 0.75 < T/T, < 0.98, 
as reported in Fig. 2, allowing us to accurately 
determine the sound speed ¢; and diffusivity D;. 
Moreover, to independently validate the reach 
of the hydrodynamic regime, a series of wave 
numbers # around 0.0584 are implemented to 
measure |¥,,,(K, @)|, and ; = D;k? is achieved 
with nearly the same sound speed and dif- 
fusivity (32). 

However, for temperatures from 0.99(2)T7- 
to1.01(2)T., the two-fluid hydrodynamic mod- 
el becomes inadequate. Although the sound 
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ed by using Eq. 1. 


speeds and diffusivities can be still acquired 
from the curve fitting to Eq. 1, a more accurate 
determination requires a nonperturbative dy- 
namic scaling analysis (3-5), in which the crit- 
ical second sound response is a universal 
function (33) of w/(ak*/?) at fixed values of 
ké. Here, the constant a. sets the energy scale 
in the critical regime. Finally, we mention that 
the second sound cannot be resolved in the 
spectra for T < 0.757, (see fig. S6 for an ex- 
ample) for two possible reasons. First, the 
Landau-Placzek ratio epp becomes smaller 
(7, 26), leading to a negligible second sound 
weight Z, in the density response spectrum 
IXnn(%, @)|. Second, there is a transition from 
the hydrodynamic to collisionless regime toward 
low temperatures (7), which occurs at a typical 
temperature of about 0.47, for superfluid 
helium (34). Nevertheless, the high-frequency 
sound peak is consistently well-resolved in the 
spectra down to the lowest achieved temper- 
ature of 0.42(2)T¢. 


Sound speeds and superfluid fraction 


The normalized sound speeds ¢;/vp and vg /Up 
as a function of T/T, are reported in Fig. 3A. 
Our high-resolution spectra yield very accu- 
rate first sound speeds c,; and adiabatic sound 
speeds vg, with a typical relative error of just 
~1%, allowing us to determine the universal 
state functions (6-79) of the unitary Fermi 
gas through standard thermodynamic relations 


(32). Specifically, from the saturated first sound 


4 February 2022 


speed ¢;/vp = 0.350(4) at the lowest achieved 
temperature 0.42(2)7., we deduce the Bertsch 
parameter € = 0.367(9) by using the relation 
C/OF = Jé/3. This value is in excellent agree- 
ment with the previous thermodynamic mea- 
surement value (18) corrected in (30) of & = 
0.370(5)(8) and the latest quantum Monte 
Carlo result (35) of § = 0.367(7). 

For the second sound speed ¢, an intriguing 
feature is the sensitive temperature depend- 
ence: ¢2/Ug decreases rapidly with increasing 
temperature up to 0.98(2) 7, (Fig. 3B). Notably, 
Co /0p suddenly jumps to a saturated value of 
~0.02 at 0.99(2)T/T;, implying the breakdown 
of hydrodynamics near the superfluid transi- 
tion. From the measured sound speed cy or v, 
we determine a fundamental quantity for the 
macroscopic description of superfluidity— 
the superfluid fraction n; /n—by applying the 
well-known relation c} = (Ts?ng) /(¢pn) or 
v* = (Ts?ns) /(meynn), where s is the entropy 
density and 2, =” — n, is the normal fluid den- 
sity (32). As shown in Fig. 3C, the superfluid 
fraction of a unitary Fermi superfluid is close 
to that of superfluid helium (36) in the vicinity 
of the superfluid transition; for example, the 
system contains about 20% superfluid compo- 
nent at 0.97(2)T/T:. However, as the temper- 
ature decreases further, n;/n of the unitary 
Fermi superfluid notably deviates from 
that of superfluid helium, indicating the dis- 
tinctive nature of superfluidity in the system. 
ns /n of the unitary Fermi superfluid has been 
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Fig. 3. Sound speeds and superfluid fraction. (A) Temperature dependence of the normalized first sound speed c;/Vf (blue circles) and adiabatic sound speed vs /Vvp (orange 
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circles). (B) Temperature dependence of the normalized second sound speed cp /V¢ (purple circles) and associated sound speed v/v (green circles), wherev = ,/cf +5 — vé. 


Here, vp = hkp/m is the Fermi speed, and all the sound speeds are obtained from fitting the density response spectra (see Fig. 2). (C) Temperature dependence of 
the superfluid fraction n;/n, compared with that of superfluid helium (green dash-dotted line). Vertical error bars represent one standard uncertainty obtained from the 
curve fitting and the measured universal thermodynamic functions; the horizontal error bars show the statistical uncertainty of the temperature determination. 


indirectly extracted from the one-dimensional 
second sound speed measured in a highly elon- 
gated harmonic trap (27). Our direct measure- 
ments improve the accuracy on 7;/n, thereby 
providing a benchmark for theoretical calcu- 
lations, which so far remain a notoriously dif- 
ficult task in quantum many-body physics. 


Sound diffusivity and transport coefficients 


To address the main focus of this work—the 
first and second sound attenuation—we pres- 
ent the temperature dependence of sound dif- 
fusivities and transport coefficients in Fig. 4. 
For the sound diffusivity D;, two important 
features are evident, as shown in Fig. 4, A and 
C. One is that all the D; are in the vicinity of the 
quantum Heisenberg limit, that is, D; ~h/m, 
which is anticipated for strongly correlated 
quantum liquids owing to the absence of 
well-defined quasiparticles (7). For instance, 
motivated by holographic duality (13, 37), the 
diffusivity D of any diffusive mode should obey 
Dz hc? /(kgT), where c is a typical speed scale 
of the system. By taking T~T. ~0.17Tp and 
C~ Ug = 0.430, at T, (see Fig. 3) for the unitary 
Fermi gas, we find the bound D ~ h/m. The 
other one is that each diffusivity shows a 
sudden rise very close to T, (i.e., at T ~ 0.95T,), 
with an increment AD; ~ 0.3//m. We interpret 
the sudden rise as a precursor of quantum 
criticality near the superfluid transition, where 
the sound attenuation and thermal conductiv- 
ity start to exhibit critical divergence (3, 5). 

In liquid helium, the critical divergence has 
been observed both in the first and second 
sound attenuation in the temperature interval 
of |T — T.| <1mK or |t| <5 x 10~* @, 10). 
The quantum critical region of the unitary 
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Fermi gas (i.e., |t] < 0.05) is thus about 100 
times larger than that of the liquid helium. 
This extremely large critical region makes the 
unitary Fermi gas an ideal platform to deter- 
mine the universal critical ratios [e.g., Ro = 
D»/(2c¢9§)] that remain elusive (5). We note 
that the first sound diffusivity D, of the unitary 
Fermi gas has been measured recently (22), 
and the obtained results agree with ours. How- 
ever, the sudden rise in D; near T, has not been 
resolved because of the relatively large uncer- 
tainty of the measurement (22). 

Two general damping mechanisms account 
for the sound attenuation: (i) The first is the 
viscous damping stemming from the diffusion 
of momentum, characterized by the shear vis- 
cosity n and four bulk viscosities ¢; (Z = 1, 2, 3, 
4). For the unitary Fermi gas, most of the bulk 
viscosities vanish thanks to the scale invar- 
iance, and the only remaining ¢, turns out to 
be negligible (38). (ii) The second is the ther- 
mal damping caused by the diffusion of heat, 
characterized by the thermal conductivity 
«. The relative contribution of these mecha- 
nisms can be quantified by the dimensionless 
Prandtl number Pr = n¢p /«. From the sound dif: 
fusivity, we determine « = cy[(D,+ D2)mn 
4nn/(3ny)] andy = 3nm(D, + D2 —Ds)/4 (32) 
and present the results in Fig. 4, B and D, 
respectively. 

The shear viscosity in Fig. 4D exhibits a 
weak temperature dependence below about 
0.95T,, settling at a nearly constant value— 
the quantum limit n ~ nh. However, a smooth, 
but pronounced, increase is observed in the 
vicinity of the superfluid transition. The trap- 
averaged shear viscosity of a unitary Fermi gas 
in a harmonic trap has been previously mea- 


sured through anisotropic expansion (20), and 
the local shear viscosity has also been indirect- 
ly extracted (39). The n obtained from our 
direct measurement is about two times larger 
than the previous result (39) in the superfluid 
phase. Moreover, as a quantitative measure, 
the inset of Fig. 4D shows the ratio of shear 
viscosity to entropy density n/s, which is ex- 
pressed in the unit of 4/(4nkg), the lower 
bound conjectured by Kovtun, Son, and Star- 
inets (KSS) for a perfect fluid (40). Around the 
superfluid transition, n/s is about 18 times 
larger than the KSS bound, suggesting that 
the unitary superfluid is not a “perfect fluid.” 

The thermal conductivity « similarly attains 
the universal quantum limit ~nikg/m below 
about 0.95 T,, as shown in Fig. 4B. Notably, a 
weak but distinct divergence is revealed on 
both sides of the superfluid transition, leading 
to a pronounced lambda peak around T, with 
a considerable increment 5« ~ 3nhkg /m. This 
weak divergence is consistent with the dy- 
namic critical scaling theory of the super- 
fluid transition (3-5), in which x~|t|-Y/?~|t| 1°. 
The observed sudden rise in the sound diffusivity, 
as shown in Fig. 4, A and C, can be also attrib- 
uted to such a divergence. Finally, the Prandtl 
number Pr (inset of Fig. 4B) is about unity 
near the superfluid transition, suggesting that 
the viscous damping and thermal damping are 
equally important to the sound attenuation. 
The obtained Pr ~ 1 implies that the unitary 
Fermi gas can be treated as a holographic con- 
formal nonrelativistic fluid (4D. In liquid helium, 
the investigations of critical divergence in the 
thermal conductivity above the (-transition 
and in the second sound attenuation below 
the A-transition play a vital role in setting up the 
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Fig. 4. Temperature dependence of sound diffusivities and transport coefficients. (A) The second sound 
diffusivity D>. (B) The thermal conductivity «. The inset shows the Prandtl number, with the line marking Pr = 1. 
(C) The first sound diffusivity D;, together with the associated sound diffusivity D, in the inset, where 

Ds = D, + D2 — a (D) The shear viscosity n. The inset shows the viscosity-to-entropy ratio, in the units of 
h/ (Akg). Away from the superfluid transition, the temperature dependence in D; and D2 can be understood from 
the relations D, ~ n/(nm) and Dz ~ nns/(nmn,), which are valid at low temperatures. The saturated D, is 
consistent with a nearly constant shear viscosity, whereas the rapid increase of D> may be caused by the loss of 
the normal fluid component, that is, ns /Nn > © as T — 0. A similar temperature dependence of the second sound 
diffusivity D2 has been observed in the superfluid helium (10). Vertical error bars represent one standard error. 


effective theory for the critical mode across 
the superfluid transition (3-5). For the unitary 
Fermi gas, our measurements not only com- 
plete the macroscopic description of its super- 
fluidity but also provide a means to understand 
the microscopic details of the superfluid tran- 
sition in the strongly interacting regime. 


Outlook 


Our system offers great promise for studying 
many fundamental problems in quantum many- 
body systems with strong interactions. For 
example, by investigating the temperature and 
wave number dependence of the density re- 
sponse, the transition from collisionless to 
hydrodynamic behavior of the unitary Fermi 
gas can be fully characterized and thus illu- 
minate the establishment of hydrodynamics 
in the strongly interacting regime. Moreover, 
by adjusting the box-trap geometry (e.g., a 
longer longitudinal length) and further opti- 
mizing the system, Bragg spectroscopy with a 
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smaller wave number and a higher energy re- 
solution can be implemented. Therefore, a 
systematic exploration of the quantum critical 
region with improved temperature controlla- 
bility can be achieved, paving the way to map 
out several long-sought universal critical dy- 
namic scaling functions. Our setup can also be 
readily modified to realize a two-dimensional 
homogeneous Fermi superfluid and thus pro- 
vides an ideal platform for investigating the 
second sound attenuation and related quan- 
tum transport across the Berezinskii-Kosterlitz- 
Thouless transition. 
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Reconfigurable perovskite nickelate electronics for 
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Reconfigurable devices offer the ability to program electronic circuits on demand. In this work, we 
demonstrated on-demand creation of artificial neurons, synapses, and memory capacitors in post-fabricated 
perovskite NdNiO3 devices that can be simply reconfigured for a specific purpose by single-shot electric pulses. 
The sensitivity of electronic properties of perovskite nickelates to the local distribution of hydrogen ions 
enabled these results. With experimental data from our memory capacitors, simulation results of a reservoir 
computing framework showed excellent performance for tasks such as digit recognition and classification of 
electrocardiogram heartbeat activity. Using our reconfigurable artificial neurons and synapses, simulated 
dynamic networks outperformed static networks for incremental learning scenarios. The ability to fashion the 
building blocks of brain-inspired computers on demand opens up new directions in adaptive networks. 


ontinual learning in artificial intelligence 

(AI) presents a formidable challenge. 

Models are generally trained on station- 

ary data distributions, and thus when 

new data are presented incrementally 
to a neural network, this interferes with the 
previously learned knowledge, resulting in 
poor performance, which is known as cata- 
strophic forgetting and remains an active field 
of research (J, 2). One of the major approaches 
to tackle this issue is to actively adapt the 
structure of the network itself when new data 
becomes available. Not only does modulating 
the architecture of the network in response to 
the input distribution allow the network to 
manage its resources efficiently, recent discov- 
eries also suggest that a dynamic network can 
show better performance as compared with 
that of a static network when provided with 
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equal resources (3, 4). Moreover, as smart edge 
devices become more integrated into society, 
they will require the implementation of so- 
phisticated networks in hardware constrained 
by both chip area and power. Having the abil- 
ity to reallocate network resources dynamically 
to perform various tasks in an ever-changing 
environment will be of fundamental impor- 
tance (3). Having programmable capabilities 
in hardware can be game changing for future 
computers whose designs are inspired by the 
intelligence of animal brains. 

In this work, we showed that perovskite 
nickelates, a class of quantum materials that 
undergo room-temperature electronic phase 
transitions upon hydrogen doping, enable a 
versatile, reconfigurable hardware platform 
for adaptive computing. A single device made 
from H-doped NdNiO; (NNO), for example, 
could be electrically reconfigured on demand 
to take on the functionalities of either neu- 
rons, synapses, or memory capacitors (Fig. 1A). 
Such versatile tunability was distinctively en- 
abled by the synergistic combination of a vast 
array of metastable configurations for protons 
in the perovskite lattice that can also be volt- 
age controlled. Although a variety of ionic- 
electronic switches are being explored for 
neuromorphic computing (5-10), complete 
reconfiguration of neuromorphic functions 
has remained elusive. To demonstrate exam- 
ple applications in AI, we used the experi- 
mental data from our memory capacitors in a 
reservoir computing (RC) framework (Fig. 1B), 
a brain-inspired machine learning architecture, 
and simulation results demonstrated excel- 
lent performance comparable with those of 
theoretical and experimental reservoirs. The 
experimental characteristics of neurons and 
synapses obtained from the perovskite nickel- 
ate devices and their run-time reconfigurability 
were leveraged to design self-adaptive dynamic 
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grow-when-required (GWR) networks (Fig. 1C). 
Motivated by the cortical data processing in the 
brain, GWR networks present an unsupervised 
approach to lifelong learning in real-world sce- 
narios with limited availability of training 
samples, which in turn may have missing or 
noisy labels. We demonstrated that such net- 
works can exploit the creation and deletion of 
network nodes on the fly to offer greater rep- 
resentation power and efficiency in compar- 
ison with those of static counterparts. 


Results and discussion 


Perovskite nickelates (chemical formula 
ReNiOs, where Re is a rare-earth ion such as 
Nd) are a class of quantum materials whose 
electronic properties are mediated by strong 
electron interactions. Pristine NNO is a cor- 
related metal at room temperature. Hydro- 
gen dopants as electron donors can lead to a 
reduction in electrical conductivity by several 
orders of magnitude through modifying the Ni 
orbital configuration (17). Gently redistributing 
the hydrogen ions (protons) already doped 
in the lattice by electric fields can modify the 
electrical conductivity systematically to gener- 
ate a multitude of electronic states. For ex- 
ample, by annealing NNO devices in hydrogen 
gas (with catalytic electrodes such as Pd or Pt), 
hydrogen can be doped interstitially into the 
NNO lattice proximal to the electrode. The hy- 
drogen atoms then donate electrons to the Ni 
d orbitals, which changes the filling state in 
the NNO d band and results in a phase tran- 
sition with a change in resistivity several orders 
of magnitude. (From here on, the hydrogen- 
doped NNO will be referred as H-NNO for 
simplicity.) A vast array of metastable energy 
states are available to the protons in the lat- 
tice, and thus, their distribution and local con- 
centration (and therefore function) can be 
subsequently modulated with electric fields 
applied to the electrode. The switching mech- 
anism of the H-NNO device is compared with 
traditional nonfilamentary resistive memory 
devices in table S1. 

To demonstrate reproducible electrical re- 
configuration in H-NNO, 50-nm-thick NNO 
films were deposited through different meth- 
ods, sputtering and atomic layer deposition 
(ALD), as well as on different substrates, 
LaAlO; and SiO, on Si (structural character- 
izations of representative pristine NNO films 
are provided in fig. S3, and device details are 
provided in fig. S4). First, we described the 
capacitive behavior (charge storage) in our 
devices. Capacitors not only are useful for 
storing charge in the conventional sense but 
are also central for numerous brain-inspired 
computing architectures. Evolution of mem- 
capacitive loop states in the perovskite nickel- 
ate device as a function of hydrogen doping is 
shown in fig. S5. With increasing hydrogen 
doping, the H-NNO film resistance increased 
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Reconfigurable perovskite device 


Neuromorphic computing 


Reservoir Network 


Fig. 1. Reconfigurable perovskite devices. (A) Schematic of hydrogen-doped 
perovskite nickelate as a versatile reconfigurable platform that can be electrically 
transformed between neurons, synapses, and memory capacitors to enable 
adaptive neuromorphic computing. By applying electric pulses, the hydrogen 
ions in the nickelate lattice can occupy metastable states and enable distinct 
functionalities. (B) Schematic of a generic RC framework. An input layer 
distributes the signals into the reservoir, which projects the inputs into a high- 
dimensional space. Here, the reservoir is built randomly from programmable 
devices with memory. No training happens in the reservoir; only the linear 


and then eventually saturated at ~10° ohms 
(fig. S5A). Without any hydrogen recharging 
process to the device for 6 months, hydro- 
gen remained in the NNO lattice, and the 
resistance of the H-NNO device was stable. 
To explore the capacitive behaviors of the H- 
NNO device at different hydrogen doping 
states, we performed cyclic voltage sweeps 
(figs. S5, B to H). Pristine and weakly doped 
perovskite NNO showed linear resistor behavior. 
At the intermediate doping state, capacitive 
behavior appears. Electrical reconfiguration 
of the H-NNO device is summarized in Fig. 2, 
A to F. By applying positive and negative 
electric pulses, the resistance state of the 
device could be modulated carefully, and the 
programmed resistance states are nonvolatile 
(fig. S6). At the electronic state i, cyclic voltage 
sweep measurements of the nickelate device 
were performed, and linear resistor behavior 
was observed (Fig. 2A). The electronic state i 
was then switched to electronic state ii by ap- 
plying a single voltage pulse, where a current- 
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voltage (-V) loop appeared, indicating stored 
energy in the device (Fig. 2B). Memristive 
and memcapacitive behaviors were also dem- 
onstrated at state ii (supplementary text 2). 
Next, we showed the creation of artificial 
neurons and synapses (that are responsible 
for information transfer and memory in the 
brain) from the same device. Spiking neuronal 
behavior in the H-NNO device was studied at 
the electronic state iii (Fig. 2C). Consecutive 
electric stimuli were applied to the device, 
and once a critical level was reached, abrupt 
changes in the device resistance were ob- 
served. The nonvolatile neuronal response of 
the nickelate device to electric stimulus de- 
pended on both pulse voltage and pulse width 
(figs. S7 and S8). A typical spiking probability 
plot is shown in Fig. 2D, which could be di- 
rectly implemented in neural networks. We 
then demonstrated synaptic behavior at elec- 
tronic state iv in the nickelate device by means 
of continuous voltage sweeps (Fig. 2E). As sown 
in fig. S9, threshold pulse fields were inves- 
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readout layer is trained by a simple gradient descent algorithm. The role of 
the readout layer is to map the high-dimensional dynamics of the reservoir to 
the output states. (©) Schematic of GWR networks. As the network is shown 
various classes of data, it maps high-dimensional data to a low-dimensional 
map field to perform clustering on the classes. When a new class is added to 
the input stream, the network can detect the new input and grow in size by 
adding network nodes to accommodate it. Additionally, if any of the classes do 
not appear in the input stream for a long time, the corresponding nodes 
become inactive, saving resources. 


tigated for both high-resistance state (HRS; 
state iii) and low-resistance state (LRS; state 
iv). At LRS, a smaller threshold pulse field 
(Vin) was sufficient to modulate the device 
resistance, which was suitable for analog be- 
havior with gradual resistance changes. How- 
ever, this analog update of device resistance 
prohibited the sudden jump in resistance nec- 
essary for spiking. At HRS, a much higher Vi, 
was required to change the resistance and was 
beneficial for spiking neuronal behavior. Last, 
the linear resistor state v in Fig. 2F could be 
restored by applying a single electric pulse. 
Electrical reconfiguration at various resistance 
states of the H-NNO device is shown in fig. 
S10, demonstrating versatility of the device 
platform. After 1.6 x 10° cycles of endurance 
measurement of a scaled nickelate device, 
we performed electrical reconfiguration of the 
device, and the results showed that all func- 
tional modes were reproducible (fig. S11). 
For example, configuration between linear 


resistor and capacitor states at initial and after 
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Fig. 2. A single perovskite device can be electrically reconfigured to 
perform essential functions in a neuromorphic computer. (A) Nickelate 
device as a linear resistor under cyclic voltage sweep. (B) Nickelate device as a 
capacitor under cyclic voltage sweep. The appearance of an /-V loop indicates 
stored energy in the device. I-V loops of different sizes can be generated by 
applying pulse fields (ii). Complete details can be found in supplementary text 2. 
(C) Nickelate device as a spiking neuron (iii). Resistance changes of the 
nickelate device were monitored in response to consecutive electric pulses 
(-0.45 V/um for 1 us). After the spike fires, the resistance of the device was 
restored to the original state by applying reset voltage pulse (+0.45 V/um for 
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1 us). (D) Spiking probability of nickelate device as a function of pulse field, 


showing stochastic behavior. (E) Nickelate device as a synapse. I-V curves 

of nickelate devices were measured under continuous voltage sweeps. The 
resistance of the device increases continuously, showing analog synaptic 
updates. (F) Resetting the nickelate device back to initial linear resistor state. 
(G and H) Representative electrical reconfiguration between linear resistor and 
capacitor of the scaled nickelate device at initial and after 1.6 x 10° cycles 

of endurance measurement. Reconfiguration of all modes are presented in 

fig. S11. Details of endurance measurement are provided in fig. S22. (I) Spatial 
mapping of Raman (signal to baseline of Raman shift ranging from 320 to 
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470 cm’) of a 15 by 3 um? rectangular area near the Pd electrode for the 
H-NNO device at both HRS and LRS. Scale bar, 3 um. The bright areas 
correspond to NNO regions in the nickelate device, which showed strong 
peak intensity of Tz, mode at ~439 cmt. The normalized Tag peak intensity 
[I(t2g,area)/I(tag,.max)] near the Pd electrode were obtained from the dashed 
rectangular area. The relative peak intensity of the H-NNO device at LRS 
was 0.77, whereas that of H-NNO at HRS dropped to 0.68. (J) Near-field 
spectrum (TERS) of H-NNO device at LRS (green) and at HRS (orange), when 


1.6 x 10° cycles of endurance measurement are 
presented in Fig. 2, G and H, respectively. A 
single device could be reconfigured as resistor, 
memcapacitor, neuron, or synapse with electric 
pulses. 

To understand the nanoscale mechanisms 
that enable electrical reconfiguration, we per- 
formed in-depth characterization on represent- 
ative H-NNO devices at LRS and HRS (Fig. 2, I 
to L) that correspond to synapse and neuronal 
states, respectively. Confocal Raman spectra 
ranging from 300 to 550 cm ' were first col- 
lected from two control samples: a pristine 
NNO film near the Pd electrode and a heavily 
doped NNO film near the Pd electrode (fig. 
S12). The T2, mode of NNO was present at 
~439 cm’ for pristine NNO, whereas it dis- 
appeared for heavily doped NNO, indicating 
dense proton concentration near the Pd elec- 
trode. We performed two-dimensional (2D) 
Raman mapping (signal to baseline mode, scan 
range from 320 to 470 cm") over a rectangular 
region (15 by 3 um?) at this boundary for the 
H-NNO device at LRS and at HRS in Fig. 21. 
The relative peak intensity of T2, mode of the 
H-NNO device at LRS was 0.77, whereas for 
HRS this dropped to 0.68, indicating higher 
local proton distribution of H-NNO at HRS 
near the Pd electrode. Near-field tip-enhanced 
Raman scattering (TERS) was carried on the 
H-NNO device at LRS and at HRS near the Pd 
electrode (Fig. 2J). Details of control experi- 
ments for near-field TERS are provided in fig. 
S13. A broad T2, peak of NNO could be seen 
near the Pd electrode at LRS; however, no 
such weak peak was detected on NNO near 
the Pd electrode for H-NNO at HRS, indicating 
relatively higher proton concentration near 
the Pd electrode. We used scattering-type scan- 
ning near-field optical microscopy (s-SNOM) 
at a laser frequency of w = 952 cm to image 
the local distribution of doping of H-NNO 
devices at LRS and HRS. Details of control 
experiments for s-SNOM on reference devices 
are included in fig. S14. Second harmonic 
infrared (IR) (@ = 952 cm‘’) near-field ampli- 
tude images of the H-NNO device at LRS and 
HRS near the Pd electrode are shown in Fig. 2, 
K and L, insets, respectively. Normalized am- 
plitude line profiles of the NNO devices at LRS 
and HRS are provided in fig. S15. The first 
derivative of the normalized amplitude in- 
dicates proton concentration changes near 
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the boundary between the Pd electrode and 
H-NNO channel shown in Fig. 2, K and L. At 
HRS, the proton concentration changed over 
a longer lateral distance compared with that 
of at LRS. The s-SNOM amplitude signal dif- 
ferences revealed local chemical composition 
differences in H-NNO at different functional 
states, which was consistent with Raman re- 
sults. Further, the carrier localization length 
scale of H-NNO device at HRS was smaller than 
that at LRS, as determined from temperature- 
dependent electrical transport measurements 
(fig. S16). The nanoscale chacterization of de- 
vices showed consistent results that the local 
proton distribution of H-NNO device at LRS 
and HRS near the Pd electrode were differ- 
ent. Density functional theory (DFT) calcu- 
lations further indicated that differences in 
the location of protons could lead to modula- 
tion of energy band gap of NNO (figs. S17 to 
$20), which is of relevance to different func- 
tional states. Nudged elastic band (NEB) cal- 
culations showed that the proton migration 
barrier could vary from 0.2 to 0.6 eV, depend- 
ing on the migration path (supplementary text 1). 
Therefore, different local proton distributions 
at LRS and at HRS of the H-NNO device could 
lead to different functional states. 

We also fabricated nickelate devices with 
100 nm gap size to demonstrate scalability, 
endurance, reproducibility, and ultralow en- 
ergy consumption (figs. S21 to S24). In scaled 
devices, electrical reconfiguration could be 
realized with <10-ns electric pulses. The en- 
ergy cost for a single synaptic update was 
~2 {J, which is comparable with that in the 
brain (1 to ~100 fJ) (12). To demonstrate com- 
patibility with CMOS (complementary metal- 
oxide semiconductor) technology, nickelate 
devices were fabricated on SiO, on Si sub- 
strates through both sputtering and ALD 
(an industrial technique used to grow high- 
quality metal-oxide films for state-of-the-art 
electronics), and data are shown in figs. S25 
and S826. 

To showcase applications of the adaptive 
nickelate hardware, we applied the experi- 
mental memristive and memcapacitive behav- 
iors in RC, a brain-inspired machine-learning 
architecture that addresses the issue of train- 
ing complexity and parameter explosion, com- 
monly observed in traditional recurrent neural 
networks (RNNs), by only adapting a simple 
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the Ag tip was engaged near the Pd electrode. The dashed line indicates 
the fitting of the Raman peak. At LRS of H-NNO, T2, mode was found near the 
electrode and was suppressed from H-NNO at HRS. (K and L) Zoom-in of 
first derivative of the normalized second-harmonic IR near-field amplitude 
of the H-NNO device at LRS and at HRS near the boundary between the 

Pd and H-NNO. (Insets) Second-harmonic IR (w = 952 cm’) near-field 
amplitude images of H-NNO devices at LRS and HRS, respectively. Scale 


output layer. RC explains higher-order cog- 
nitive functions and the interaction of short- 
term memory with other cognitive processes 
(13). Details can be found in supplementary 
text 2. To have a baseline comparison, we eval- 
uated the performance of our H-NNO device 
in comparison with theoretical models (14, 15) 
and experimental reports (/6, 77) for three 
different tasks: MNIST (Modified National 
Institute of Standards and Technology database) 
digit recognition, isolated spoken digit recognition, 
and ventricular heartbeat classification on an 
electrocardiogram (ECG) dataset. The simu- 
lation results in Fig. 3, A to C, demonstrate 
that our H-NNO reservoirs could achieve com- 
parable performances on the three tasks with 
fewer devices compared with the theoretical 
and experimental reservoirs. The results of 
performance-device ratios in Fig. 3D show 
that our H-NNO reservoirs, on average, out- 
performed the theoretical and experimental 
reservoirs by a factor of 1.4:x, 1.2x, and 5.1x for 
MNIST, isolated spoken digits, and ECG heart- 
beat, respectively. Detailed explanations of the 
performance are in supplementary text 2. 
Having the neuronal and synaptic function- 
ality in a single type of device could enable 
compact and energy-efficient neuromorphic 
system designs. Discussion on deep neural 
networks that use such perovskite networks is 
given in supplementary text 3. Furthermore, 
the ability to reconfigure devices for multiple 
neuromorphic functions opens up their in- 
novative use in next-generation Al—namely, 
in the emerging domain of dynamic neural 
networks. The GWR network is one such ex- 
ample that creates new nodes and their inter- 
connections according to competitive Hebbian 
learning. The GWR networks expand on the 
concept of self-organizing neural networks 
by adding or removing network nodes in an 
unsupervised manner to approximate the 
input space accurately and at times more 
parsimoniously as compared with a static 
self-organizing map (J8). We can compare 
the dynamic GWR with a static self-organizing 
network that uses the same Hebbian learning 
scheme but has a fixed number of nodes, ini- 
tialized randomly in the beginning. We trained 
our network on two archetypal datasets used 
to evaluate performance in literature, MNIST 
(19) and a subset of CUB-200 (20), to simulate 
how such a network will perform on the fly. 
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Fig. 3. Reservoir computing 
simulations with data measured 
from nickelate devices. (A to 

D) The simulation results of 
eservoirs with H-NNO devices, 
compared with theoretical and 
experimental memristive models of 
eservoirs, demonstrated that a 
arge and random network of 
H-NNO devices could function as a 
hardware platform for neuromorphic 
computing in solving complex 
tasks. The simulation results were 
based on the average results of 
simulating a sample size of 

00 reservoirs with similar hyper- 
parameters for each reservoir type 
to reduce uncertainty owing to 

the stochastic nature of reservoir 
networks. As shown in (A) to (C), the 
H-NNO reservoirs could achieve 
comparable performances on 

three tasks with fewer devices. The 
performance/device ratios in (D) 
indicate that the H-NNO reservoirs, 
on average, outperformed the 
theoretical and memristive 
reservoirs by a factor of 1.4x, 
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1.2x, and 5.1x for MNIST, isolated 
spoken digits, and ECG heartbeats, 
respectively. 


Discussion on the datasets and details of the 
simulation are available in the supplementary 
materials, materials and methods, and supple- 
mentary text 4. The GWR network’s ability to 
dynamically respond to changes in the input 
distribution is visualized in Fig. 4A for MNIST. 

For both the datasets and networks, we 
conducted two sets of simulations using the 
experimental data from our H-NNO devices: 
(i) incremental learning, in which the network 
is shown newer classes of data over time, and 
(ii) assessing the effect of growing or shrink- 
ing compared with static networks—how ef- 
ficiently the GWR can represent the input 
space. The network’s test accuracy and the 
number of nodes as each new class was trained 
for both the datasets in the incremental learn- 
ing scenario are shown in Fig. 4, B to E. We 
observed that the dynamic network was able 
to retain its learned representations much 
better than could the static network, with 
the final test of accuracy resulting in MNIST 
being 212% more accurate and CUB-200 being 
250% more accurate. By growing its size, the 
network avoided suffering from catastrophic 
forgetting and showed only a smooth degra- 
dation in performance as the number of classes 
was increased. The size of the static network 
was chosen to be equal to the maximum num- 
ber of nodes that the GWR network required. 
This arrangement ensured that the difference 
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(MC) (MC) (MR) (MR) 


we observed was not due to the size difference 
of the two networks but rather because of the 
dynamic network’s ability to grow and learn. 
We then studied the ability of the GWR net- 
work to dynamically change its size to adapt 
to the input space. First, we assessed the 
networks’ ability to grow as the number of 
classes in the network was increased abruptly 
(Fig. 4, F and G). Initially, we presented the 
networks with the first half of the total num- 
ber of classes in the datasets, and the GWR 
grew and saturated in size. Afterward, when 
the networks were presented with the entire 
dataset, the GWR rapidly grew its size to ac- 
commodate the change. The static network 
was not able to do so and thus failed to learn 
the new data, also suffering degradation in 
performance in the initial classes (detailed 
accuracy results are provided in supplemen- 
tary text 4 and figs. S27 and S28). Overall, 
the dynamic networks achieved better accu- 
racy on the test set in comparison with that 
of the static network: 210% for MNIST and 
170% for CUB-200. Next, we demonstrated that 
the GWR was able to efficiently allocate its 
resources compared with a large static net- 
work. We presented the network with all the 
classes of the dataset at the beginning. After 
learning occurred, we removed half the cat- 
egories and let the GWR network reduce its 
size and reach an equilibrium number of nodes 
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(Fig. 4, H and I). We found that the GWR was 
able to retain a similar level of the performance 
to that of the large static network (accuracy 
difference, 2 to 3%) on the subset of interest 
and demonstrated higher efficiency through 
shrinking its size by ~47% for MNIST and 
~27% for CUB-200 (detailed accuracy results 
are provided in supplementary text 4 and 
figs. S29 and S30) In addition to simulation 
studies, we conducted proof-of-concept exper- 
iments to demonstrate the reconfiguration 
ability of the H-NNO devices in hardware 
for an incremental learning scenario, in 
comparison with a static network. Detailed 
discussions on the results are included in sup- 
plementary text 5. 


Conclusions 


We have demonstrated artificial neurogene- 
sis in perovskite electronic devices: the ability 
to reconfigure hardware building blocks for 
brain-inspired computers on demand within a 
single device platform. Dynamic deep learn- 
ing networks simulated with the experimen- 
tally measured characteristics of the nickelate 
devices consistently outperformed static coun- 
terparts. The results showcase the potential of 
reconfigurable perovskite quantum electronic 
devices for emerging computing paradigms 
and AI machines. Additionally, semiconductor 
technology-compatible ALD on Si platforms 
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Fig. 4. Dynamic grow-when-required computing with experimental 
characteristics measured from nickelate devices. (A) Visualization of the 
GWR network's ability to dynamically respond to changes in the input 
distribution over time for the MNIST dataset. First, we showed the network 
10,000 input samples from the first five classes (“O” to “4”) of the MNIST 
dataset. The network could grow and learn the representation as seen in i. Next, 
the network was trained on 20,000 samples from all the 10 classes of the 
MNIST. Because of the addition of new classes, the network grew in size and 
accommodated them, as seen in ii. The accuracy over all the classes is shown in 
the bar chart (top right). Last, we again changed the input class distribution 
by only showing the network the classes “O" to “4” We observed that the 
network could gradually shrink its size as nodes associated with the last 

five classes slowly became inactive and were removed from the network, as 
seen in iii. Here, the digits are the learned representations of the nodes, whereas 
each unit of the black region indicates an unused and inactive node in the 
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network. (B to E) Network performance for incremental learning of classes. 

(B) Test accuracy for MNIST as the number of classes is incrementally increased 
from 1 to 10. (C) Number of nodes as the number of classes is increased for 
MNIST. (D) Test accuracy for the 50 classes of CUB-200 as the number of 
classes is incrementally increased from 1 to 50. (E) Number of nodes as the 
number of classes is increased for CUB-200. (F to 1) Assessing the effect 

of dynamically changing size of GWR compared with static network with fixed 
number of nodes. [(F) and (G)] The GWRs achieved 51.6% better accuracy 

on MNIST and 41.3% better accuracy on CUB-200 as compared with static 
networks that were not allowed to grow beyond the size of the dynamic network 
before learning the new classes. [(H) and (1)] We observed similar performance 
on the classes that are available in both the networks; however, the dynamic 
network achieves these results with almost half (~47.3%) the number of 
resources and nodes for MNIST and ~27% fewer nodes for the 50 classes 

of CUB-200 compared with the static networks. 
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We discovered a highly virulent variant of subtype-B HIV-1 in the Netherlands. One hundred nine individuals 
with this variant had a 0.54 to 0.74 logio increase (i.e., a ~3.5-fold to 5.5-fold increase) in viral load 
compared with, and exhibited CD4 cell decline twice as fast as, 6604 individuals with other subtype-B 
strains. Without treatment, advanced HIV—CD4 cell counts below 350 cells per cubic millimeter, with 
long-term clinical consequences—is expected to be reached, on average, 9 months after diagnosis 

for individuals in their thirties with this variant. Age, sex, suspected mode of transmission, and place of 
birth for the aforementioned 109 individuals were typical for HIV-positive people in the Netherlands, 
which suggests that the increased virulence is attributable to the viral strain. Genetic sequence analysis 
suggests that this variant arose in the 1990s from de novo mutation, not recombination, with increased 
transmissibility and an unfamiliar molecular mechanism of virulence. 


he risk posed by viruses evolving to greater 

virulence—i.e., causing greater damage 

to their hosts—has been extensively studied 

in theoretical work despite few population- 

level examples (7-3). The most notable 
recent example is the B.1.617.2 lineage (Delta 
variant) of severe acute respiratory syndrome 
coronavirus 2 (SARS-CoV-2), for which an in- 
creased probability of death has been reported 
(4-6), as well as increased transmissibility 
(7, 8). RNA viruses have long been a particular 
concern, as their error-prone replication re- 
sults in the greatest known rate of mutation— 
and thus high potential for adaptation. Greater 
virulence could benefit a virus if it is not out- 
weighed by reduced opportunity for transmis- 
sion. These antagonistic selection pressures 
may result in an intermediate level of virulence 
being optimal for viral fitness, as observed for 
HIV (9). Concrete examples of such evolution in 
action, however, have been elusive. Continued 
monitoring of HIV virulence is important for 
global health: 38 million people currently live 


with the virus, and it has caused an estimated 
33 million deaths (www.unaids.org). 

The main (M) group of HIV-1, responsible 
for the global pandemic, first emerged around 
1920 in the area of what is now Kinshasa, 
Democratic Republic of the Congo (10), and 
had diversified into subtypes by 1960 (11). The 
subtypes, and the most common circulating 
recombinant forms (CRFs) between the sub- 
types, took different routes for global spread, 
establishing strong associations with geogra- 
phy (72), ethnicity, and mode of transmission. 
Differences in virulence between subtypes and 
CRFs have been reported, though it is chal- 
lenging to disentangle genotypic effects on 
virulence from confounding effects while re- 
taining large sample sizes, given the strong 
associations between viral, host, and epidemi- 
ological factors (13). The co-receptor used for 
cell entry has long been understood to affect 
virulence (/4, 15), and this has been proposed 
as amechanism that underlies differences in 
virulence between subtypes and CRFs (13), 


as well as one reported difference within a 
CRF (J6). 

HIV-1 virulence is most commonly measured 
by viral loads (the concentration of viral par- 
ticles in blood plasma) and CD4 counts (the 
concentration of CD4* T cells in peripheral 
blood, which tracks immune system damage 
by the virus). Successful treatment with anti- 
retroviral drugs suppresses viral load and in- 
terrupts the decline in CD4 counts that would 
otherwise lead to AIDS. Both viral load and rate 
of CD4 cell decline are heritable properties— 
that is, these properties are causally affected by 
viral genetics, leading to correlation between 
an individual and whomever they infect (17-27). 
It has therefore been expected that viral load 
and CD4 cell decline could change with the 
emergence of a new viral variant. We substan- 
tiate that expectation with empirical evidence 
by reporting a subtype-B variant of HIV-1 with 
exceptionally high virulence that has been cir- 
culating within the Netherlands during the 
past two decades. 


Discovery of the highly virulent variant 


Within an ongoing study (the BEEHIVE pro- 
ject; www.beehive.ox.ac.uk), we identified a 
group of 17 individuals with a distinct subtype-B 
viral variant, whose viral loads in the set- 
point window of infection (6 to 24 months 
after a positive test obtained early in the 
course of infection) were highly elevated 
(Table 1, middle column). BEEHIVE is a study 
of individuals enrolled in eight cohorts across 
Europe and Uganda, who were selected be- 
cause they have well-characterized dates of 
infection and samples available from early 
infection, for whom whole viral genomes were 
sequenced. The 17 individuals with the dis- 
tinct viral variant comprised 15 participants 
in the ATHENA study in the Netherlands, 
1 from Switzerland, and 1 from Belgium. See 
materials and methods for details on the 
initial discovery. 


Replication of the discovery in Dutch 
ATHENA data 


To replicate the finding and to investigate this 
viral variant in more detail, we then analyzed 
data from 6706 participants in ATHENA with 
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subtype-B infections (expanding on the subset 
of 521 participants in ATHENA who were 
eligible for inclusion in BEEHIVE). We found 
92 additional individuals infected with the viral 
variant, bringing the total to 109 such indi- 
viduals in either dataset. When replicating the 
BEEHIVE test with the ATHENA data (Table 1, 
right column), we again observed a large rise 
in viral load in individuals with this viral variant: 
an increase of 0.54 logy viral copies/ml (i.e., a 
~3.5-fold increase). The effect size was the same 
in a linear model including age at diagnosis and 
sex as covariates, and persisted in newly diag- 
nosed individuals over time (Fig. 1A). Hence- 
forth, for brevity, we refer to this viral variant 
as the “VB variant” (for virulent subtype B), to 
individuals infected with this variant as “VB 
individuals,” and to individuals infected with a 
different strain of HIV as “non-VB individuals.” 


Search for closely related viruses 


To test whether the variant was more widely 
disseminated, we searched publicly available 
databases for similar HIV viral genotypes. All 
results had <95% sequence similarity to a rep- 
resentative viral sequence for the variant. 
Of the 17 VB individuals originally found in 
BEEHIVE, one was from the Swiss HIV Cohort 
Study (22) (SHCS). By examining previously 
published data (23), we found that three other 
individuals from the SHCS were closely re- 
lated (a phylogenetic distance below 2.5%). 
The high coverage of the Swiss HIV Cohort 
[including 89% of reported new infections 
from 2009 through 2018, with ~65% of the 
cohort sequenced (24)] makes it unlikely that 


many more VB individuals in Switzerland were 
undetected. Data to assess viral load or CD4 cell 
decline for these three individuals were not 
available, owing to early initiation of treatment. 


More-rapid CD4 cell decline 


At the time of diagnosis, CD4 counts for VB 
individuals were already lower than for non- 
VB individuals by 73 cells/mm? [95% confi- 
dence interval (CI): 12 to 134]. These counts 
subsequently declined faster, by a further 
49 cells/mm? per year (CI: 20 to 79), in addi- 
tion to the decline for comparable non-VB 
individuals [49 cells/mm? per year (CI: 46 
to 51) for men diagnosed at the age of 30 to 
39 years]. The VB variant is therefore asso- 
ciated with a doubling in the rate of CD4 cell 
decline. These values are averages estimated 
by using a linear mixed model adjusted for 
sex and age at diagnosis. Figure 1B illustrates 
the CD4 count decline that would be expected 
if disease progression were to continue lin- 
early in the absence of treatment. Initiating 
treatment at a CD4 count of 350 cells/mm?, 
instead of immediately, was previously shown 
to substantially increase the subsequent hazard 
for serious adverse events (25). As seen in Fig. 
1B, this stage of CD4 cell decline is expected 
to be reached in 9 months (CI: 2 to 17) from 
the time of diagnosis for VB individuals, as 
opposed to 36 months (CI: 33 to 39) for non- 
VB individuals, in males diagnosed at the age 
of 30 to 39 years. It is reached even more 
quickly in older age groups, for which we found 
progressively lower CD4: counts at time of diag- 
nosis (table S1). At a CD4 count of 200 cells/ 


mm’, there is a high risk of immediate AIDS- 
related complications; without treatment this 
stage of decline would be reached, on average, 
between 2 and 3 years after diagnosis for VB 
individuals and between 6 and 7 years after 
diagnosis for comparable non-VB individuals 
[the latter being similar to previous reports in 
Europe (26)]. 

The effect of the VB variant on CD4 cell 
decline remained after we adjusted for the 
effect of higher viral load. With this adjust- 
ment, VB individuals have a CD4 count at 
diagnosis as would be expected given their 
high viral loads, but their subsequent decline 
in CD4 counts is again twice as fast as for as 
comparable non-VB individuals with high viral 
loads—their rate of decline is accelerated by 
44 cells/mm? per year (CI: 16 to 72). Com- 
parison of this additional decline with that 
expected from a +1 increase in logy viral load, 
15 cells/mm? per year (CI: 11 to 18), shows that 
the variant’s effect on CD4 count decline is 
equivalent to that expected from a +3.0 increase 
in logy viral load. The same analysis of mea- 
surements of CD4 percentages (the percentage 
of all T cells that express CD4) showed that 
these also declined twice as fast for VB indi- 
viduals, and again this doubling in speed of dec- 
line remained when we adjusted for the higher 
viral load of the variant (table S2 and fig. S1). 


No difference in CD4 cells after treatment, or 
in mortality 


Measurements of treatment success include 
CD4 cell recovery and mortality. CD4 counts 
and percentages after treatment initiation were 


SS et 
Table 1. Comparison of viral loads between individuals infected with the VB viral variant and other individuals. When analyzing the viral loads of 
individuals in the ATHENA study, we first excluded individuals who were in BEEHIVE, so that the test would be independent of the initial finding within the 
BEEHIVE study. After our statistical tests of viral load, we did not exclude BEEHIVE individuals from the ATHENA data for subsequent analyses. N, number of 
individuals after those without viral load measurements before treatment were excluded; IQR, interquartile range. 


Test 


Viral load measurements 
compared 


Mean and IQR of viral load 
in non-VB individuals, in logio 
copies per milliliter 


Mean and IQR of viral load 
in VB individuals, in logio 
copies per milliliter 


P value for increase 


Wymant et al., Science 375, 540-545 (2022) 


Discovery 
[BEEHIVE dataset 
(Europe)] 


Set-point viral loads for 


N = 15 VB individuals and N = 2446 individuals 


with any other HIV-1 strain 


5.10 
(IQR: 4.69 to 5.58) 


5.84 
(IQR: 5.57 to 6.09) 


5x 10° 

(two-tailed t test, significant at a level 

of 5 x 10°° when Bonferroni-corrected 
for performing 50 such tests) 
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Replication 
[ATHENA dataset 
(Netherlands), excluding 
overlap with BEEHIVE] 


Mean pretreatment viral loads for 
N = 91 VB individuals and N = 5272 individuals 
with any other subtype-B HIV-1 strain 


4.79 


O38) 


ils io 
(one-tailed t test) 
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Fig. 1. Clinical characteristics of VB individuals. Those infected with the highly 
virulent variant (VB individuals) are represented in red; those infected with 

any other subtype-B virus (non-VB individuals) are shown in blue. (A) Box-and- 
whisker plots of viral load, by year of diagnosis. Diagnosis dates were grouped 

to produce boundaries that coincide with years and roughly equal numbers 

of VB individuals (39 in 2002-2006, 35 in 2007-2008, and 27 after 2008; the 
pattern is robust to other groupings). (B) Expected decline in CD4 count in the 


similar for VB and non-VB individuals, as mea- 
sured with both linear mixed modeling of the 
CD4 dynamics (tables S3 and S4 and fig. $2) 
and an individual-matching procedure. The 
hazard for death (from any cause) was also 
similar: VB individuals had a relative hazard 
of 1.4 (CI: 0.7 to 2.8, P = 0.35, Cox proportional 
hazards model). Our study had statistical 
power to detect only very large differences in 
mortality, as reflected in the wide CI for rela- 
tive hazard for death and shown in Fig. 1C. VB 
individuals had similar CD4 counts and mortality 
after treatment despite a faster CD4 cell de- 
cline before treatment; this could be explained 
by their tendency to start treatment sooner 
after diagnosis (fig. S3). For example, although 
the probability of having started treatment 
was estimated to be similar 6 months after 
diagnosis [42% (CI: 41 to 44%) for non-VB 
individuals compared with 46% (CI: 35 to 
54%) for VB individuals], it was different 
2 years after diagnosis [65% (CI: 64 to 67%) for 
non-VB individuals and 93% (CI: 85 to 96%) 
for VB individuals]. Had VB individuals not 
started treatment earlier than others, lower 
CD4 counts at treatment initiation would have 
been expected, potentially causing increased 
morbidity and mortality (25). This informa- 
tion could be relevant if VB or variants like it 
are found in settings with less widespread 
availability of HIV care. 


Characteristics of individuals infected with 
the VB variant 


VB individuals were mostly (82%) men who 
have sex with men, similar to non-VB individ- 
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uals (76%). Age at diagnosis was also similar 
for VB and non-VB individuals (fig. S4). Neither 
ethnicity nor host genotype data were available, 
but the place of birth was mostly recorded as 
Western Europe for both groups (71% for non- 
VB individuals, 86% for VB individuals). VB 
individuals were present in all regions of the 
Netherlands, but with a different distribution 
relative to that of non-VB individuals (NV = 102 
versus 6604 individuals, P < 10-7, simulated 
Fisher’s exact test): VB individuals were more 
common in the south (25% of VB individuals 
versus 6% of non-VB individuals) and less 
common in Amsterdam (20% versus 51%), as 
shown in table S5. Table S6 lists the hospitals 
included in each region. The average time from 
infection to diagnosis, for men who have sex 
with men in this cohort diagnosed in the late 
2000s, was previously estimated to be 3.6 years 
(CI: 3.3 to 4.0) (27). 


Genotype of the VB variant 


Sequence data from the BEEHIVE project are 
whole-genome data, providing the 17 whole 
genomes available for the variant; sequence 
data from ATHENA are partial pol gene data 
only, available for the additional 92 VB in- 
dividuals. We subtyped the 17 whole genomes 
for the variant as pure subtype B [with 100% 
support from two concordant methods (28, 29)], 
like most HIV-1 in the Netherlands. We pre- 
dicted co-receptor usage from the 17 whole 
genomes using two concordant methods 
(30, 31): one was likely CXCR4-tropic; the other 
16 were likely CCR5-tropic. Only one drug- 
resistance mutation was common for the VB 
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absence of treatment. The model was adjusted for sex and age at diagnosis; 
values shown are for males diagnosed at the age of 30 to 39 years. Shaded regions 
indicate 95% Cls in the model's prediction of mean values, given the uncertainty 
in estimation of parameter values (it does not reflect the variability between 
individuals in each of the two groups, which is much greater). The dashed black 
line denotes a CD4 count of 350 cells/mm? (see text for details). (C) Probability 
of still being alive at a given time after diagnosis. 


variant: Met*"—Leu (M41L), present in 91 of 
109 partial pol gene sequences. Without other 
linked resistance mutations, M41L causes only 
low-level resistance to zidovudine (32, 33). 
Two of the whole genomes were found to 
be recombinants between the VB variant and 
another subtype-B cluster in ATHENA (con- 
taining a small amount of sequence from the 
latter) and were excluded from subsequent 
sequence analysis. Among whole genomes 
in BEEHIVE and all whole genomes in the 
Los Alamos National Laboratory HIV Database 
(www.hiv.lanl.gov), none appeared to be a 
candidate for a “recombination parent” of 
the VB variant—i.e., the many mutations that 
distinguish the VB variant from any other 
known virus appear to have arisen de novo, 
not through recombination. 

We compared the consensus sequence for 
the VB variant with the consensus of all Dutch 
subtype-B sequences in BEEHIVE, at both 
the amino acid and the nucleotide level: There 
were 250 amino acid changes and 509 nu- 
cleotide changes, as well as insertions and 
deletions. These alignments are included as 
data S1, and the amino acid alignment is il- 
lustrated in fig. S5. The distribution of nu- 
cleotide changes over the genome is in line 
with expectations (for example, fewer in the 
conserved pol gene region and more in the 
variable env gene region; see fig. S6). The VB- 
variant genotype is thus characterized by 
many mutations spread through the genome, 
meaning that a single genetic cause for the 
enhanced virulence cannot be determined from 
the current data. 
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We conducted descriptive analyses of the 
mutations that distinguish the VB variant 
from the Dutch subtype-B consensus. All of 
the amino acid-level changes are listed in data 
$2 with annotations. Of the observed amino 
acid substitutions, 30 were previously shown 
to be positively associated with escape from 
cytotoxic T lymphocyte (CTL) response for at 
least one human leukocyte antigen type, and 
13 were shown to be negatively associated (34). 
To provide context for these numbers, within 
Dutch subtype-B data in BEEHIVE we defined 
16 other clades that are similar to the lineage 
in size (see materials and methods). For each 
clade, we calculated the amino acid consensus 
sequence, compared this to the Dutch subtype-B 
overall consensus, and determined CTL escape 
mutations. This showed that the number of 
such mutations for the VB variant is typical 
when normalized by its overall level of diver- 
gence (fig. S11). We also calculated the ratio 
of rates of nonsynonymous and synonymous 
changes (d,,/d;) for each gene, for the VB var- 
iant and the other 16 Dutch subtype-B clades 
used for comparison. The VB variant had lower 
d,/d, values than all of the other clades for 
env, pol, and tat, though its values were not 
extreme; for the other genes, its d,/d, value 
was in the range spanned by the other clades 
(fig. S12). Finally, at codon position 77 of 
the protein Vpr, the consensus of all Dutch 
subtype-B sequences in BEEHIVE is gluta- 
mine, whereas the VB consensus is arginine. 
Glutamine was previously found to be more 
common in long-term nonprogressors, and 
mutation to arginine increased T cell apoptosis 
in vitro and strongly increased T cell decline in 
mouse models (35). However, both alleles have 
been commonly observed in subtype B to 
date (of 2178 subtype-B Vpr protein sequences 
in the Los Alamos National Laboratory HIV 
Database, 52% have glutamine and 36% have 
arginine), making it implausible that this mu- 
tation alone is the dominant mechanism for 
the virulence effect we observed. 


Evolution of the VB variant 


The maximum-likelihood phylogeny in Fig. 2A 
shows the VB variant in the context of back- 
ground sequences, demonstrating that it is a 
distinct genetic cluster characterized by high 
viral loads. The phylogeny was inferred from 
15 whole-genome VB-variant sequences and 
100 randomly chosen whole-genome subtype-B 
background sequences from BEEHIVE. Figure 
2B shows a dated phylogeny for VB-variant 
sequences only, estimated by using BEAST 
(36) and partial pol sequences. This phylogeny is 
colored by region, inferred with an ancestral 
state reconstruction by parsimony (minimizing 
changes of region). Amsterdam was assigned 
to the most recent common ancestor in 97% of 
trees in the posterior, showing that this re- 
construction was robust to the uncertainty in 
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the phylogeny. All VB-variant sequences date 
from 2003 onward; the time of their most recent 
common ancestor (TMRCA) was estimated as 
1998.0 (95% credibility interval: 1995.7 to 2000.1). 
Trees were visualized by using ggtree (37). 


Phylodynamics of the VB variant 


The effective population size (V.) of a patho- 
gen is indicative of the number of infectious 
people. For the VB variant, NV, was estimated 
by using a skygrid demographic model (38) 
in BEAST and is shown in Fig. 2C (scaled by 
the coalescent generation time t). NV, increased 
until roughly 2010; after this, there is more 
uncertainty but a possible downward trend 
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2000 2005 


Year 


2010 2015 


[which may be an artefact of N. inference 
methods in the recent past (39)]. The pro- 
portion of VB-variant cases among all new 
subtype-B cases increased until a peak in 
2008 and subsequently decreased, though 
again with appreciable uncertainty [absolute 
numbers of both VB and non-VB diagnoses 
in our dataset have been decreasing since 
roughly 2008, and the data are right-censored 
by several years (fig. S7)]. In a recent analysis 
of an updated version of the ATHENA dataset 
(40), 33 additional VB individuals were found, 
which suggests that VB diagnoses were stable 
until roughly 2013 and have since been declining, 
still with appreciable uncertainty. 


4 of 6 


RESEARCH | RESEARCH ARTICLE 


We calculated the local branching index 
(LBI), which is a measure of fitness (41). For 
HIV in a context in which most individuals 
start treatment without long delays, the LBI 
is closely related to transmissibility (see sup- 
plementary text). Compared with that of other 
transmission clusters, the LBI was higher for 
the VB variant both in BEEHIVE (P = 2 x 107”) 
and ATHENA (P < 2 x 107”; fig. $8). High 
pretreatment transmissibility may explain why 
the VB variant grew to be the 10th largest of 
1783 clusters in the full ATHENA tree. 


Tree imbalance and evolution within the 
VB-variant clade 


We found nothing unusual in the extent to 
which the VB variant’s phylogeny is imbalanced, 
nor did we detect any indication of further 
evolution of viral load within the variant’s clade 
(supplementary text and fig. S9). 


The first sampled VB individual 


We retrieved and sequenced two additional 
samples from the VB individual who was di- 
agnosed in 1992, 10 years before subsequent 
diagnoses of other VB individuals. Phyloge- 
netic analysis suggested that this individual 
was infected with a virus that had evolved most 
of the way, but not entirely, toward VB-variant 
viruses typical of later dates (supplementary 
text and fig. S10). This individual was diagnosed 
in Amsterdam, consistent with the afore- 
mentioned ancestral reconstruction of region. 
In the 10 years before this first VB diagnosis, 
the proportion of individuals diagnosed in the 
Netherlands for whom a viral sequence was 
available was roughly one-third. The propor- 
tion of those diagnosed or undiagnosed would 
be smaller still. This means that the infector of 
the 1992 individual was most likely not sampled, 
and indeed two or three steps in the transmis- 
sion chain could have been unsampled. The long 
phylogenetic branch leading to the 1992 indi- 
vidual could therefore represent between-host 
evolution, not necessarily within-host evolution 
in a single individual. 


Discussion 


Previous studies of the heritability of viral load 
and CD4 cell decline led us to expect that these 
properties could change with the emergence 
of a new variant of HIV-1. We provided strong 
evidence for this, discovering a virulent sub- 
type-B variant (the VB variant) that has been 
circulating in the Netherlands since the late 
1990s. We characterized the variant’s geno- 
type and evolutionary history, as well as its as- 
sociation with high viral loads, rapid decline 
of CD4 cells, and increased transmissibility. 
We found 109 individuals with the variant 
(VB individuals) whose age, sex, suspected 
mode of transmission, and region of birth 
are all typical for people living with HIV in 
the Netherlands. This suggests that the ob- 
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served association is causal: The increased 
virulence is a property of the virus rather than 
a confounding property of individuals in this 
transmission cluster. An absence of viral load 
evolution inside the clade of VB variants sug- 
gests that the increased virulence is a property 
of the whole clade and not a subset of it—i.e., 
that the virulence evolution occurred on the 
long phylogenetic branch that connects this 
clade to other known viruses. 

Deferring the initiation of treatment until 
the measurement of a CD4 count’s decline to 
<350 cells/mm? or the onset of AIDS, instead of 
immediate treatment initiation upon diagno- 
sis, was previously shown to increase the 
subsequent hazard of serious AIDS-related 
events by a factor of 3.6 (CI: 2.0 to 6.7) and of 
any serious event (including death) by a factor 
of 2.4 (CI: 16 to 3.3) (25). This long-lasting 
immunological damage justifies WHO’s 
classification of 350 CD4 cells/mm? as “ad- 
vanced HIV” (www.who.int/hiv/pub/guidelines/ 
HIVstaging150307.pdf). Without treatment, 
advanced HIV is expected to be reached in 
only 9 months (CI: 2 to 17) from the time of 
diagnosis for VB individuals, compared with 
36 months (CI: 33 to 39) for non-VB individuals, 
in males diagnosed at the age of 30 to 39 years. 
Advanced HIV is reached even more quickly in 
older age groups, and there is considerable var- 
iation between individuals around these ex- 
pected values. Many individuals could therefore 
progress to advanced HIV by the time they are 
diagnosed, with a poorer prognosis expected 
thereafter in spite of treatment. In practice, 
there is still substantial variation in the delay 
from becoming infected to starting treatment, 
making the VB variant a concern even in the 
high-awareness and highly monitored context of 
the Dutch HIV-1 epidemic. In contexts with less 
awareness and monitoring, in which diagnosis 
often occurs later in infection, the probability of 
reaching advanced HIV before diagnosis would 
be even greater. 

Future in vitro investigations could more 
firmly establish the role of the viral genotype, 
and reveal an as-yet-unknown virulence mech- 
anism at the molecular or cellular level. A 
higher replicative capacity of the virus might 
be observed, given the increased viral loads 
seen here. However, it is likely that there will 
be more to the virulence mechanism: The VB 
variant doubles the rate of CD4 cell decline, 
measured with both counts and T cell percen- 
tages, even after adjusting for its higher viral 
load. This rate is equivalent to the acceleration 
of CD4 degradation that would be expected 
from a 3.0 logy increase in viral load, though 
we observed a 0.54 to 0.74 logo increase. 
This means that the virulence normalized by 
the amount of virus—the “per-parasite path- 
ogenicity” (42, 43), which for HIV is heritable 
(19)—is much higher for the VB variant. Using 
two aforementioned methods, we predicted 
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that, of the 17 whole genomes available, 16 use 
only the R5 co-receptor for cell entry, which is 
typical for subtype-B viruses in early infection 
(13). This finding suggests that the underlying 
virulence mechanism is distinct from the well- 
known effect of cell tropism (14, 15). 

Previous studies have reported population- 
wide increases (44, 45) and decreases (46) in 
virulence over time. Mixed results between 
individual studies [see (47) for a meta-analysis] 
can be attributed to differences in epidemic 
context (such as the dominant subtypes), statis- 
tical power, and observational biases over time. 
Temporal virulence trends could also be due 
to changing confounders, such as a shift in 
which subpopulations are most affected, the 
stage of infection at time of diagnosis, or coin- 
fections. We expand on these studies by re- 
solving a change in virulence to an individual 
viral variant. 

The basic theory of an infectiousness- 
virulence trade-off is that infectiousness 
and virulence are linked (for example, by how 
fast a pathogen replicates in its host) and that 
selection pressures favor intermediate values 
rather than extreme ones. If infectiousness is 
too low, the pathogen cannot be transmitted 
when its host contacts other hosts, but if 
virulence is too high, the host becomes too ill 
to have such contacts. In the case of HIV, the 
implication of this theory is that we would not 
expect highly virulent viruses to spread widely 
through a population in the absence of wide- 
spread treatment, because their hosts would 
progress to AIDS very quickly, limiting the 
opportunities for transmission (9). Most of 
the evolution that gave rise to the VB variant 
occurred before 1992, before effective combi- 
nation treatment was available. However, our 
findings may stimulate further interest in 
whether widespread treatment shifts the 
balance of the infectious-virulence trade-off 
toward higher virulence, thus promoting the 
emergence and spread of new virulent variants. 
Previous modeling studies have investigated this 
idea for pathogens generally (48) and for HIV 
specifically (49, 50). We discuss subtleties of 
the argument in the supplementary text, but 
our conclusion is that widespread treatment 
is helpful to prevent new virulent variants, 
not harmful. The absolute fitness of viral 
variants must be considered in addition to 
their relative fitness, and treatment reduces 
the total onward transmission over the course 
of one infection, regardless of virulence. Put 
simply, “viruses cannot mutate if they cannot 
replicate” (anonymous), and “the best way to 
stop it changing is to stop it” [Marc Lipsitch (6D). 
Early treatment also prevents CD4 cell decline 
from leading to later morbidity and mortality; 
thus clinical, epidemiological, and evolution- 
ary considerations are aligned. Our discovery of 
a highly virulent and transmissible viral variant 
therefore emphasizes the importance of access 
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to frequent testing for at-risk individuals and of 
adherence to recommendations for immediate 
treatment initiation for every person living with 
HIV (www.who.int/hiv/pub/arv/). 
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Diversification of aliphatic C-H bonds in small 
molecules and polyolefins through radical 


chain transfer 


Timothy J. Fazekas, Jill W. Alty, Eliza K. Neidhart, Austin S. Miller, Frank A. Leibfarth*, Erik J. Alexanian* 


The ability to selectively introduce diverse functionality onto hydrocarbons is of substantial value 
in the synthesis of both small molecules and polymers. Herein, we report an approach to aliphatic 
carbon-hydrogen bond diversification using radical chain transfer featuring an easily prepared 
O-alkenylhydroxamate reagent, which upon mild heating facilitates a range of challenging or 
previously undeveloped aliphatic carbon—hydrogen bond functionalizations of small molecules and 
polyolefins. This broad reaction platform enabled the functionalization of postconsumer polyolefins 
in infrastructure used to process plastic waste. Furthermore, the chemoselective placement of 
ionic functionality onto a branched polyolefin using carbon—hydrogen bond functionalization 
upcycled the material from a thermoplastic into a tough elastomer with the tensile properties of 


high-value polyolefin ionomers. 


he direct transformation of unreactive 

aliphatic C-H bonds to useful function- 

ality is a streamlined and sustainable 

approach to accessing complex mole- 

cules and materials with enhanced 
properties from readily available compounds 
(1-4). Late-stage diversification of drug-like 
molecules, in which complex substrates are 
modified selectively to alter their function, 
has emerged as a powerful strategy to access 
new lead compounds for medicinal chemistry 
and structure-activity relationship studies with- 
out resorting to de novo synthesis (5). Despite 
substantial progress, there remains a press- 
ing need for new aliphatic C-H diversifica- 
tion platforms that facilitate the site-selective 
introduction of a range of desirable function- 
ality to small-molecule substrates with sub- 
strate as the limiting reagent. 

An estimated 95% of the economic value of 
plastics is lost after a single use (6). Specif- 
ically, branched polyolefins represent >35% 
of polymers produced worldwide but undergo 
deleterious chain scission events during me- 
chanical reprocessing or polymer functionaliza- 
tion, which degrades their thermomechanical 
properties and contributes to their poor 
recycling rate (<5% in the United States) 
(7, 8). Developing synthetic methods to place 
desired functionality on postconsumer branched 
polyolefins would lead to performance- 
advantaged thermoplastics derived from single- 
stream or mixed plastic waste (9, 10). The new 
materials realized from such platform methods 
could serve as sustainable substitutes to current 
high-value materials that are derived from 
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petrochemical resources, thus representing 
an example of polymer upcycling (71). 

Currently, a number of transformations of 
aliphatic C-H bonds exist and are used for 
the late-stage diversification of drug-like mol- 
ecules and commodity polymers, but most 
of these use either nearby directing groups 
to control reaction site selectivity or involve 
promiscuous reactive intermediates that lim- 
it the scope of these approaches (12, 13). A 
notable exception that uses substrate as the 
limiting reagent is the use of high-valent 
transition metal-oxo complexes in aliphatic 
C-H functionalization, but this approach is 
limited by the scope of accessible transforma- 
tions because of the use of highly oxidizing 
intermediates (14, 15). Intermolecular alkyla- 
tion or borylation of C-H bonds using rhodium 
catalysis is also well developed, but the require- 
ment for donor-acceptor diazo reagents for 
alkylation limits overall scope, and the use of a 
precious metal limits high-volume applications 
in polymer science (J6, 77). Singlet carbenes 
generated from the photochemical or thermal 
decomposition of diazirines represent an ef- 
ficient C-H functionalization strategy for 
polymer cross-linking and biopolymer photo- 
affinity labeling, but the required substitution 
pattern of the diazirine and limited functional 
group tolerance hinders broad applicability 
(18, 19). Furthermore, several valuable C-H 
transformations, such as aliphatic C-H iodi- 
nation and C-H methylation, remain limited 
regardless of approach. A universal strategy 
for aliphatic C-H functionalization, in which 
a wide array of functionality can be placed site 
selectively in an intermolecular transformation 
on both complex organic substrates and com- 
modity polymers, remains a grand challenge 
(Fig. 1A) (20). 
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Recent studies have demonstrated the utility 
of heteroatom-centered radicals to facilitate 
site-selective, intermolecular functionaliza- 
tions of unactivated aliphatic C-H bonds on 
a variety of small molecules and materials, 
constituting a complementary strategy to metal- 
catalyzed methods (27-26). These reactions 
principally harness the capacity of a tuned, 
nitrogen-centered radical to achieve facile 
hydrogen atom transfer (HAT) from strong, 
unactivated aliphatic C-H sites. A critical 
drawback to these previous studies is the re- 
quirement for direct group transfer of the 
functionality appended to nitrogen, which 
greatly restricts the diversity of products 
accessible through the HAT platform. With 
this in mind, we hypothesized that decoupling 
the formation of the nitrogen-centered radical 
responsible for HAT from the chain transfer 
step would unlock a universal C-H diversi- 
fication manifold applicable to a vast range 
of transformations (Fig. 1B). We identified 
an O-alkenylhydroxamate (1) as an ideal reagent 
capable of forming reactive nitrogen-centered 
radicals while also manifesting slow enough 
chain transfer kinetics for external radical traps 
to outcompete it in substrate functionalization 
(Fig. 1C). We hypothesized that such a versatile 
C-H diversification strategy would encompass 
many important transformations, including 
ones inaccessible with current synthetic tech- 
nology, and extend to areas ranging from the 
late-stage C-H diversification of complex 
molecules to applications in the transforma- 
tion of postconsumer plastic waste to func- 
tional polyolefins. 

Our initial studies demonstrated the versatil- 
ity of easily accessed, shelf-stable O-alkenylhy- 
droxamate 1 for the intermolecular, aliphatic 
C-H diversification of a range of small mole- 
cules (Fig. 2). The C-H functionalizations 
promoted by reagent 1 proceeded simply upon 
mild heating (70°C) or visible light irradiation 
without the need for an exogenous initiator, 
which is an enabling aspect of the approach. 
The C-H diversification of cyclooctane with 
substrate as limiting reagent was successful 
using 10 diverse trapping agents in good to 
excellent yield, establishing the broad scope of 
the platform (2 to 11). Practical intermolecular, 
aliphatic C-H iodination sets the stage for a 
range of challenging C-H transformations (vide 
infra) (27). Whereas there are extant methods 
available for a subset of these reactions, ex- 
amples using substrate as the limiting reagent 
remain quite rare; commonly, the alkane is 
used in large excess (>5 equivalents) and often 
as a reaction solvent. Furthermore, there are 
no platforms for aliphatic C-H functionaliza- 
tion that rival the synthetic scope demonstra- 
ted herein with respect to both the diversity of 
accessible transformations and the viable 
substrates ranging from small molecules to 
postconsumer waste. Although we targeted 
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Fig. 1. Aliphatic C-H diversification using N-functionalized amides. (A) A universal approach to C-H 
diversification would enable the introduction of a range of desired functionality onto small molecules and commodity 
polyolefins. (B) An N-functionalized amide and a diverse set of chain transfer agents constitute a general platform 
for C-H diversification. (C) The mechanistic hypothesis for C—H diversification using O-alkenylhydroxamates 
separates the HAT reagent from the chain transfer agent. Ar, 3,5-bis(trifluoromethyl)phenyl. 


many synthetically valuable C-H transforma- 
tions, additional processes are easily envisioned 
using alternative radical traps. 

We next applied the C-H diversification 
to several representative small-molecule sub- 
strates. Diverse cyclic and linear hydrocarbons 
react efficiently using substrate as limiting 
reagent (12 to 22). The sterically dictated 
site selectivities controlled by the bulky N-¢Bu 
amidyl radical favor accessible secondary C-H 
sites over weaker, tertiary C-H bonds, which 
are commonly the most reactive in C-H func- 
tionalizations (14 to 16, 19 to 22). For com- 
parison, prior efforts toward C-H diversification 
through HAT using photoredox catalysis 
strongly favored tertiary functionalization; 
such tertiary-selective functionalization is also 
characteristic of reactions involving HAT with 
sulfate radicals (28, 29). The transformation of 
the unreactive C-H bond of gaseous methane 
remains a considerable challenge for any 
C-H functionalization. The strong N-H bond 
(110.7 kcal/mol) of the parent amide of 1 
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suggested that methane HAT (C-H bond 
~105 kcal/mol) could be viable (30). As a 
demonstration of the notable reactivity of 
the amidyl radical in HAT, we successfully 
performed the (phenyltetrazole)thiolation 
of methane under our standard conditions 
to deliver 23 in 20% yield with respect to 1. 
Functionalized substrates containing electron- 
withdrawing groups (24 to 28) exhibit strong 
polar effects in discriminating between meth- 
ylene sites, with sites distal to the electron- 
withdrawing group preferred (37). With respect 
to the mechanism of reactions involving I, a 
C-H iodination competition experiment be- 
tween cyclohexane and dj .-cyclohexane pro- 
ceeded with a ky/kp of 6.4, consistent with 
an irreversible aliphatic C-H HAT. Addition- 
ally, the C-H (phenyltetrazole)thiolation re- 
action produces the a-SPT acetophenone 
byproduct, consistent with the chain transfer 
mechanism outlined in Fig. 1C. The notable 
sterically and electronically dictated site selec- 
tivities characteristic of this platform, when 
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combined with the breadth of accessible C-H 
transformations, enable a wealth of valuable 
late-stage diversifications of complex mole- 
cules as described below. 

We next examined the C-H functionaliza- 
tion of several representative natural products 
and drug derivatives to highlight the scope 
of our approach. The reactions of adamantyl 
substrates were highly efficient (29 to 32). 
The benzylic functionalization of ibuprofen 
methyl ester provided fluorination and tri- 
fluoromethylthiolation products 33 and 34, 
respectively, as single regioisomers in con- 
trast to previous C-H functionalizations of this 
substrate (28). Functionalization of terpenoid 
and steroid natural products—complex mol- 
ecules with a multitude of aliphatic C-H 
sites—favors the activated C-H site o to the 
ether oxygen atom of (-)-ambroxide (35 and 
36), whereas reaction of deoxyandrosterone 
favored functionalization of the C2 position 
of the A ring (37) and reaction of trans- 
androsterone acetate favored a single dia- 
stereomer of a B-ring fluoride (38). For 
comparison, a previous C-H fluorination of this 
substrate with Selectfluor yielded greater than 
seven alkyl fluorides, with none formed in 
greater than 6% yield (32). Last, we performed 
several C-H functionalizations of the terpe- 
noid natural product (+)-sclareolide, favoring 
the most reactive A-ring methylene site (39 to 
45; for reaction optimization studies, see table 
S1). In each case, a single regioisomer was ob- 
tained in good to excellent yield with high (>10:1) 
diastereoselectivity, including the C-H iodina- 
tion which delivers iodide 40 in virtually quan- 
titative yield. The present platform thus offers a 
powerful tool for the late-stage introduction of 
fluorinated groups at unactivated aliphatic sites 
in complex molecules for modulating the ab- 
sorption, distribution, metabolism, and excre- 
tion properties of drug-like compounds. 

With respect to late-stage diversification, 
the versatility of our approach enables more 
valuable, yet rare, C-H transformations from 
now easily accessible, functionalized com- 
pounds. As a second step after the highly ef- 
ficient iodination of sclareolide (>95% yield), 
reaction with Me,CuLi delivers the formal 
C-H methylation product 46 in good yield 
as a single diastereomer, furnishing a two- 
step protocol to investigate “magic methyl” 
effects through late-stage, intermolecular meth- 
ylation of unactivated aliphatic C-H bonds 
(33, 34). Alternatively, iron-catalyzed cross- 
coupling of 40 with PhMgBr leads to the C-H 
arylation product 47; previous C-H arylation 
of this substrate required the use of super- 
stoichiometric amounts of (+)-sclareolide (35). 
Facile borylation of 40 using Bocat. followed 
by transesterification yielded 48 as a single 
product, another transformation with very 
limited precedent using substrate as the limiting 
reagent (36-38). Finally, the copper-catalyzed 
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Fig. 2. C-H diversification of small molecules using reagent 1. Yields refer 
to combined isolated products or were determined by gas chromatography 
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functionalizations are provided in examples involving minor regioisomers; see 


the supplementary materials for reaction details. *GC yield. {1H-NMR yield 
using internal standard. +!°F-NMR yield using internal standard. §Mixture 
of diastereomers, see the supplementary materials. {lrradiated with blue light. 
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cross-coupling of 40 with a primary alkyl 
amine delivered the C-H amination product 
49, constituting a formal dehydrogenative 
alkane-amine coupling (39). Other attractive 
C-H transformations are also easily envi- 
sioned capitalizing on the versatility of the 
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phenyltetrazole sulfone group, which can be 
easily accessed from product 45 (40). 

We envisioned that this versatile C-H diver- 
sification strategy could unlock numerous trans- 
formations on branched polyolefins. Commercial 
approaches to polyolefin functionalization pro- 
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n performed under 50 atm methane. 


ceed through high-energy radical processes 
that selectively abstract tertiary C-H bonds 
in branched polymers, resulting in B-scission 
processes that deteriorate thermomechanical 
properties. We hypothesized that the high 


regioselectivity of HAT involving reagent 
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chromatography (HT-GPC) was conducted at 140°C in 


1 favoring methylene sites would prevent 
polymer chain scission by eliminating the 
formation of tertiary radicals during reactive 
processing, and the generality of this method 
would enable access to a range of branched 
polyolefins with polar functionality. Such polar 
polyolefins, which are inaccessible using tra- 
ditional Ziegler-Natta or metallocene catalysis, 
enhance interfacial adhesion and provide sites 
for controlled polymer deconstruction (41). 
Linear low-density polyethylene (LLDPE; Dow 
DNDA-1081) was chosen as a model branched 
polyolefin to exemplify this method (melting 
temperature of 122°C; 36 branches per 1000 
carbons). As a representative transformation 
to introduce polar functionality incompatible 
with early transition metal catalysts, cyanation 
of LLDPE with 1 under homogeneous condi- 
tions (130°C in chlorobenzene) proceeded ef- 
ficiently with selectivity for methylene sites 
and involved no discernable chain scission, 
as confirmed by size-exclusion chromatog- 
raphy (SEC) and a variety of one- and two- 
dimensional nuclear magnetic resonance (NMR) 
techniques (Fig. 3A and figs. S1 and S12 to S15). 
More precise analysis of selectivity was obtained 
using a narrow-dispersity PE (NDPE), made 


through the reduction of poly(1,4-butadiene). 
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trichlorobenzene. *Reaction time was 10 min. 


The SEC chromatogram was virtually identical 
before and after functionalization, demonstrat- 
ing the lack of chain scission or long-chain 
branching accompanying polymer function- 
alization (Fig. 3C). By contrast, an analogous 
cyanation using dicumy] peroxide as a radical 
initiator in place of 1 yielded no functionaliza- 
tion and a decrease in polymer molecular 
weight. All polymer functionalizations target a 
maximum of 10 mol % repeat-unit modifica- 
tion to add functionality while maintaining the 
beneficial semicrystalline nature of the material. 

In addition to polyolefin cyanation, the in- 
stallations of fluoride, bromide, iodide, tri- 
fluoromethylthiol, thiophenyl, azido, and 
(phenyltetrazole)thiol groups onto LLDPE 
exemplified the versatility of this approach. 
Several of these polyolefin C-H transforma- 
tions deliver products inaccessible by other 
means (24, 25, 42, 43). To further extend the 
scope, C-H cyanation, thiophenylation, and 
iodination were successful on complementary 
substrates, including highly crystalline high- 
density PE (HDPE), branched LDPE (41 branches 
per 1000 carbons), postindustrial waste PE 
(PIPE) remnants from packaging forms, and 
postconsumer waste PE (PCPE) obtained from 
PE foam packaging (Fig. 3B). Furthermore, 
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thiophenylation of isotactic polypropylene 
(500 branches per 1000 carbons) proceeded 
successfully without discernable chain scis- 
sion (fig. S9), demonstrating the value of this 
method for these tough and highly branched 
thermoplastics. It is notable that functional- 
ization proceeded efficiently even with an un- 
defined mixture of oxidation by-products and/or 
additives in PCPE evident by infrared and 'H- 
NMR spectroscopy, indicating the tolerance 
of this method to common impurities in plas- 
tic waste. 

The ability to place diverse functionality onto 
polyolefins through this universal approach 
provides an opportunity to substitute current 
high-value plastics, and create new ones, using 
postconsumer waste as a starting material. 
Polyolefin ionomers such as SURLYN are a 
high-value class of thermoplastics toughened 
by ionic cross-links, with applications ranging 
from structural adhesives to ion-conducting 
membranes (44). However, SURLYN is synthe- 
sized through radical copolymerization of 
acrylic acid and ethylene, which limits polymer 
architecture to a highly branched micro- 
structure, precludes use of o-olefins as comon- 
omers, and limits functional group identity 
to a carboxylate. These limitations compromise 
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the potential strength, toughness, and trans- 
port properties of the materials. There are 
currently limited strategies to prepare poly- 
olefin ionomers on materials made through 
Ziegler-Natta or related catalytic approaches 
(i.e., LLDPE or HDPE). Given the structural 
fidelity and lack of long-chain branching of 
our polyolefin functionalization approach, 
we envisioned creating ionomers from poly- 
olefins through late-stage functionalization. 
The generality of the C-H functionalization 
mediated by 1 enabled the development of 
a 2-bromoethyl thiosulfonate radical trap- 
ping reagent that installed a primary bromide 
onto the polyolefin (P23; Fig. 4A). Displace- 
ment of the bromide by methyl imidazole 
yielded imidazolium-functionalized LLDPE 
(P24), which represents a formal copolyme- 
rization of o-olefins with an ion-containing 
vinyl monomer. The ionomers had distinct 
properties from the parent LLDPE, includ- 
ing solubility in polar aprotic solvents, a de- 
creased melting temperature, and enhanced 
clarity (fig. S25). Introduction of the imida- 
zolium to only 2 mol % of the repeat units 
substantially changed the material from a 
thermoplastic to a tough elastomer (Fig. 4B). 
Although yield stress and the Young’s mod- 
ulus (£) of P24: decreased compared with 
the parent LLDPE, the strain at break (eg) 
quadrupled and the stress at break (og) more 
than doubled, leading to an increase in the 
tensile toughness (U7) of >550%. These ten- 
sile properties compare favorably to a commer- 
cial sample of Dow SURLYN, demonstrating 
the marked effect that a small amount of tar- 
geted functionalization can have on material 
properties. 
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Collectively, the ability to produce an iono- 
mer from a postconsumer waste stream with 
functional equivalence to the thermomechan- 
ical properties of a high-value commercial 
material make this upcycled material a poten- 
tially environmentally sustainable substitute 
for polyolefin ionomers (45). The transla- 
tional potential of this method was further 
demonstrated through C-H functionalization 
of PCPE in a twin-screw extruder, which is 
the infrastructure used for processing plas- 
tic waste. Reacting reagent 1 with a5 mol % 
2-bromoethy] thiosulfonate radical trapping 
reagent, we procured 7 g of 1 mol % bromo- 
ethylthiolated PCPE (P25; Fig. 4C). Reaction 
of the extruded material with methyl imid- 
azole afforded a large-scale synthesis of the 
polyolefin ionomer. Although further reagent 
development is required to make this mate- 
rial an economically sustainable substitute, 
this C-H functionalization platform enables 
access to a library of polyolefin ionomers, 
among other materials, from plastic waste. 
These ionomers can be systematically studied 
to assess the impact of ion identity, ion con- 
tent, and polymer branching on polyolefin 
properties and circularity, and could ulti- 
mately contribute to a more sustainable plas- 
tics economy. 
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Multiscale engineered artificial tooth enamel 


Hewei Zhao", Shaojia Liu’+, Yan Wei*+, Yonghai Yue’+, Mingrui Gao’, Yangbei Li’, Xiaolong Zeng’, 
Xuliang Deng”*, Nicholas A. Kotov***, Lin Guo", Lei Jiang™® 


Tooth enamel, renowned for its high stiffness, hardness, and viscoelasticity, is an ideal model 

for designing biomimetic materials, but accurate replication of complex hierarchical organization of 
high-performance biomaterials in scalable abiological composites is challenging. We engineered 

an enamel analog with the essential hierarchical structure at multiple scales through assembly 

of amorphous intergranular phase (AIP)-coated hydroxyapatite nanowires intertwined with 
polyvinyl alcohol. The nanocomposite simultaneously exhibited high stiffness, hardness, strength, 
viscoelasticity, and toughness, exceeding the properties of enamel and previously manufactured 
bulk enamel-inspired materials. The presence of AIP, polymer confinement, and strong interfacial 
adhesion are all needed for high mechanical performance. This multiscale design is suitable 

for scalable production of high-performance materials. 


ffective combination of diverse mechan- 

ical properties is highly desirable for 
engineering applications but is difficult 

to realize (2), especially for properties that 
require contradictory material design ele- 
ments such as high stiffness, hardness, visco- 
elasticity, strength, and toughness (2-4). Tooth 
enamel—the outer shell of teeth with a thick- 
ness of several millimeters (Fig. 1A)—is the 
hardest tissue in the human body and exhibits 
excellent resistance to deformational and vi- 
brational damage (5, 6). This unusual com- 
bination of properties originates from enamel’s 
hierarchical architecture, which is made up of 
96 wt % hydroxyapatite (HA) nanowires inter- 
connected by confined biomolecules (7). Most 
crystalline segments in HA nanowires in nat- 
ural enamel are interconnected by amorphous 
intergranular phase (AIP, Mg-substituted amor- 
phous calcium phosphate) (8), which considerably 
influences the mechanical performance of 
enamel (9, 10). Efforts have been made to mimic 
the parallel arrangement of nanowires in 
enamel to improve the stiffness, hardness, or 
viscoelasticity (2, 1J-14) of nanocomposites, 
but physical forms are usually limited to coat- 
ings with submillimeter thicknesses. It has 
proven difficult to assemble analogs of enamel 
that retain full structural complexity of the 
biological prototype with several essential struc- 
tural elements responsible for their mechanical 
and biological functions (i.e., nanowire align- 
ment, presence of AIP, and the confined or- 
ganic matrix) as bulk machinable materials. 
The hierarchical structure of tooth enamel 
provides a biomimetic blueprint. Self-assembled 
HA nanowires with 30- to 50-nm diameters 
align with each other, forming nanocolumns 


(Fig. 1B) that represent the key structural motif 
of enamel. The AIP layer closely connects to 
the HA nanowires and has a thickness of 3 to 
10 nm (Fig. 1, Cand D, and figs. S1 and S2). This 
interface characterization implies that there 
are strong chemical bonds between the AIP 
layer and HA nanowires, which enhance the 
interface connectivity and contributeto mechan- 
ical improvement (Fig. 1E and fig. $3). 

Our materials are made from aligned HA 
nanowires coated with amorphous ZrO, serv- 
ing as the AIP. First, HA nanowires with length 
~10 um and diameter ~30 nm [Fig. IF (eft) and 
fig. S4] were synthesized by the solvothermal 
method (5). The HA nanowires grew along the 
[001] direction with no obvious defects (fig. S5) 
and were then coated with a ~3-nm amor- 
phous layer of ZrO, (A-ZrO.) through in situ 
hydrolysis of Zr precursors, followed by sub- 
sequent annealing to form the interfaces be- 
tween the ceramics in the crystalline and 
amorphous phases [Fig. 1F (middle)]. The 
geometry and morphology of the HA nano- 
wires were retained, and the amorphous layer 
was tightly connected to the crystalline core of 
HA [Fig. 1F (right)]. The amorphous state and 
constituents of the coated layer were demon- 
strated by high-resolution transmission electron 
microscopy [Fig. 1F (right)], energy dispersive 
x-ray spectroscopy (EDS) mapping (fig. S6), 
and x-ray diffraction patterns (fig. S7). The 
abundance of surface groups -OH and PO,” on 
HA absorbed Zr**, contributing to the forma- 
tion of the thin, amorphous ZrO, layer (Fig. 1F). 
To verify the role of the AIP in mechanical 
performance enhancement, in situ tensile tests 
were performed on HA and A-ZrO,-coated 
HA (HA@A-Zr0O,) nanowires with a push-to- 


pull (fig. S8) platform and a Picoindenter 85 
nanoindenter in an environmental scanning 
electron microscope (ESEM). The fracture 
strength and strain of HA@A-ZrO, nano- 
wires are ~1.6 GPa and ~6.2%, respectively 
(Fig. 1G), which are 2.5 and 1.6 times as high 
as those of HA nanowires (~0.65 GPa and 
~4%, respectively), surpassing the mechanical 
properties of bulk HA (16). Detailed observa- 
tion of the tensile process of the HA@A-ZrO, 
nanowire reveals that the nanowires can 
endure tensile deformation as large as ~5.2% 
before fracture (fig. S9), whereas the value of 
HA is ~2.5%. The fracture surface of the HA@ 
A-ZrO, surface forms a crack deflection (fig. 
$10) instead of the brittle failure usually seen 
in brittle ceramics, which contributed to frac- 
ture strain improvement due to the presence 
of the amorphous layer. 

Dual-directional freezing of HA@A-ZrO, nano- 
wire dispersions in the presence of polyvinyl 
alcohol (PVA) was used to self-assemble macro- 
scale composites with parallel arrangement of 
the nanowires (Fig. 1H). The polydimethylsilox- 
ane (PDMS) wedge produced a bidirectional 
temperature gradient, driving the ice crystal 
growth in perpendicular and parallel directions 
(Fig. 1H). The perpendicular growth of the ice 
crystals forced the HA@A-ZrO, nanowires and 
PVA to occupy the gaps between ice lamellae, 
and the parallel growth forced them to acquire 
a parallel orientation (fig. S11). After freeze- 
drying (fig. S12) and mechanical compression, 
dense artificial tooth enamel (ATE) was prod- 
uced (Fig. 1H and fig. S13). 

ATE is machinable and can be formed into 
tooth-like macroscopic shapes (Fig. 11) with 
densely packed parallel columns with microscale 
alignment (Fig. 1J). X-ray nanotomography of 
ATE reveals that the nanowires exhibit an over- 
all architecture of parallel columns for the bulk 
composite (fig. S14). The AIP layers between the 
HA nanowires are nearly identical to those in 
enamel (Fig. 1C), as they have a thickness of 
~5 nm (Fig. 1K). Higher-magnification obser- 
vation shows the AIP and verifies that the HA 
and AIP are closely connected (Fig. 1L). EDS 
mapping and line scanning (fig. S15) of the 
crystal-amorphous-crystal interface further dem- 
onstrate that amorphous ZrO, fills the gaps 
between HA nanowires. Spectroscopic charac- 
terizations (figs. S16 and S17) including Raman, 
Fourier transform infrared spectroscopy (FTIR), 
and x-ray photoelectron spectroscopy implied 
strong chemical adhesion (Fig. IM) as a result of 
coordination between Zr** and O of PO,?” 
and -OH. This is in contrast to simple physical 
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Fig. 1. Tooth enamel and synthesis of artificial tooth enamel (ATE). 

(A) Optical photographs of tooth enamel, including a closeup view and a 
sectional slice perpendicular to the midcoronal cervical plane. (B) Scanning 
electron microscope (SEM) image of tooth enamel. (€) Transmission electron 
microscopy (TEM) image of tooth enamel. (D) Enlarged TEM image taken from 
the red zone in (C). (E) Molecular structure of HA with amorphous intergranular 
phase (AIP). (F) Schematic illustration of construction of the mimicked tooth 
enamel process; the corresponding SEM and TEM images of HA and HA@A-Zr0. 
are displayed on the left and right side, respectively. The exposed -OH on HA 
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and the balance between nucleation and growth of ZrO> precursor is the key to 
formation of the amorphous thin layer. (G) The tensile mechanical performance 
of HA with and without AIP, showing that the AIP indeed makes the nanowire 
stronger and tougher. (H) Schematic illustration of micro- and macro-assembly 
of the HA@A-ZrOz nanowires coupled with PVA. (I) Optical photographs of 
ATE, including a closeup view and a sectional slice perpendicular to the midcoronal 
cervical plane. (J) SEM image of ATE. (K) TEM image of ATE. (L) Enlarged 
TEM image taken from the blue zone in (K). (M) Molecular structure of HA and 
amorphous ZrOz2 (A-ZrOz). 


absorption (17), which may contribute to im- 
proved mechanical properties. 

We evaluated the mechanical performance 
of ATE by both nanoindentation (for stiffness, 
hardness, and viscoelasticity, with load direc- 
tion parallel to the nanowires) and the three- 
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point bending test (for strength and toughness, 
with load direction perpendicular to the nano- 
wires). To explore the role of each structural 
element, we also produced two benchmark 
HA-based composites, namely ATE from nano- 
wires with no AIP (ATE-NAIP) (fig. S18) and 
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a composite with an HA@A-ZrO, nanowire 
loading similar to ATE but with no ordered 
microstructure (ATE-NOM) (fig. S19). We eval- 
uated Young’s modulus (£), hardness (H), 
storage modulus (£'), damping coefficient 
(tan 6), strength (o), and toughness (K;.) of these 
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Fig. 2. Mechanical properties of ATE. (A) Left, schematic illustration of 
ATE, ATE-NAIP, ATE-NOM, and enamel; right, mechanical performance 

of different enamel-like composites with Young’s modulus (E), hardness (H), 
storage modulus (E’), damping coefficient (tan 8), flexural strength (o), and 
toughness (Kjc), for ATE, ATE-NAIP, ATE-NOM, and enamel plotted as a radar 
map. (B) Young’s modulus and hardness of ATE and referenced engineering 


materials, comparing them with each other and 
with natural tooth enamel (Fig. 2A and fig. S20). 
The average EF and H of ATE, calculated from 
the quasistatic nanoindentation, reach 105.6 + 
12.1 GPa and 5.9 + 0.6 GPa, respectively. These 
values are higher than those of natural enamel 
(Fig. 2A), whereas the inorganic content of ATE 
(78.06 wt %, fig. S21) is far less than that of 
enamel (>96 wt %) (18). The presence of high- 
modulus ZrO, is one reason that the modulus 
of ATE surpasses that of enamel. The twofold 
increase in stiffness and hardness compared 
with ATE-NAIP and ATE-NOM indicates the 
necessity of the nanoscale crystal-amorphous 
interface and parallel organization of columns 
at the microscale to reach the high values of 
macroscale stiffness and hardness seen in ATE. 
This conclusion can be confirmed by compar- 
ing the stiffness and hardness of ATE to those 
of stiff natural biomaterials (e.g., teeth, nacre, 
bones of many animals), previously reported 
HA-based composites, and ceramic-polymer 
composites (19) (Fig. 2B and table S1). 
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ATE exhibits high viscoelasticity without 
sacrificing its stiffness and hardness (20), (Fig. 
2C and table S2). The average storage mod- 
ulus of ATE can be up to 78.6 + 9.8 GPa witha 
frequency of 10 Hz, whereas the average damping 
coefficient (tan 5) of ATE reaches ~0.07 (Fig. 2A 
and fig. S20, C and D), which exceeds the limits of 
traditional engineering materials with similar 
storage modulus such as ceramics and metals 
(20) (usually 0.001 to 0.01). The viscoelastic 
figure of merit (VFOM, defined as the product 
of E’and tan 5) is as high as 5.5 GPa, nine times 
as high as the limitation of traditional engi- 
neering materials (~0.6 GPa) and six times as 
high as in the ZnO-based enamel-like compo- 
sites (2). The data for ATE-NAIP and ATE-NOM 
composites (Fig. 2A) point to the importance 
of the AIP and hierarchical organization with 
microscale alignment. Considering that clin- 
ically relevant loadings for natural tooth enamel 
occur at a typical frequency of 1 Hz (27), we have 
also tested the viscoelastic performance of ATE 
at a frequency of 1 Hz. The E’ and tan 6 of ATEs 
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materials including biomaterials and HA/other ceramic-based composites. 
The performance of ATE is outstanding. (C) Storage 

modulus and damping coefficient of ATE compared with biomaterials, 
HA-based composites, ceramics, and ceramic-based composites. 

(D) Flexural strength and toughness of ATE compared with biomaterials 
and HA-based composites. 


are 73.5 + 10.1 GPa and ~0.075, respectively 
(fig. S22), comparable to the values for 10 Hz. 

ATE achieved a flexural strength of ~142.9 MPa 
and fracture strain of 0.018, which are superior 
to those of enamel (Fig. 2A and fig. S20E). The 
flexural strength and fracture strain of ATE are, 
respectively, ~2 and ~10 times as high as those 
of HA ceramic (22) (fig. S23). The nearly two- 
fold reduction of flexural strength in ATE-NAIP 
and ATE-NOM (Fig. 2A) compared with ATE 
further supports the necessity of multiscale 
engineering of organic-inorganic composites. 

Single-edge notched beam (SENB) tests were 
carried out (23) and their mechanical behavior 
were observed through in situ bending with an 
ESEM. The typical stress-strain curves from 
SENB tests were plotted in fig. S24. The initial 
fracture toughness (K;,) of ATE (2.0 + 0.5 MPa m"”) 
is higher than both ATE-NAIP (10 + 0.1 MPa m?”) 
and ATE-NOM (1.2 + 0.1 MPa m”) (fig. $25), 
meaning that ATE shows more resistance to 
the initial crack during deformation. Increased 
toughness during crack propagation can be 
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Fig. 3. Polymer confinement in ATE. (A) Small-angle x-ray diffraction (SAXD) of ATEs with different HA@A-ZrO2 nanowires contents. (B) Differential scanning 


calorimetry (DSC) analysis of ATEs with different inorganic content and PVA. (C) Schematic illustration of po 


A-ZrO2/PVA interfaces in ATE. 


evaluated by the so called R-curve effect (23), 
enabling us to calculate Kj,.. This type of 
toughness for ATE was Kj, = 7.4 + 0.4.MPam?, 
a 3.7-fold increase compared with K;, = 2.0 + 
0.5 MPa m”” (Fig. 2A and fig. S20F) and also 
higher than the fracture toughness of ATE- 
NAIP (1.6-fold increase), ATE-NOM (3.8-fold 
increase), enamel (1.6-fold increase), and refer- 
enced HA ceramics (12.3-fold increase) (22) 
(Fig. 2A). When compared with biomaterials 
and HA-based composites (24) (Fig. 2D and 
table S3), ATE shows an excellent combina- 
tion of high strength and high toughness. High 
toughness of ATE can also be observed with 
bending tests for angles unusually high for 
ceramic materials (fig. S26). Additionally, the 
strength and toughness of ATE tested in the 
direction parallel to the nanowires are also 
analyzed (fig. S27). Because of the anisotropic 
structure of ATE, the strength and toughness 
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of ATE parallel to the nanowires are lower 
than those perpendicular to the nanowires 
but are still higher than those of enamel with 
the same direction (fig. S27). 

The excellent combination of mechanical 
properties can be attributed to ATE’s hierarchi- 
cal enamel-mimetic structure and the design of 
inorganic and organic constituents. To under- 
stand the functional mechanism between the 
inorganic and organic constituents, we synthe- 
sized five types of ATEs with varying content of 
inorganic nanofillers (HA@A-ZrO, and PVA 
ratios range from 1:1 to 5:1 and are defined as 
ATE-1 to ATE-5; the corresponding inorganic 
nanofillers content increased from 41.28 to 
78.06%, fig. S21), and tested their mechanical 
properties (fig. S28). We found that stiffness 
(E), hardness (H), and viscoelasticity (VFOM) 
increased with increasing percentage of in- 
organic nanofillers reaching the highest value 
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ymer confinement and chemical bonding at the 


for ATE-5; strength and toughness increased 
with the content of inorganic phase reaching 
a plateau for ATE-3 (fig. S28). A greater con- 
centration of inorganic nanofillers equates to 
a denser composite and smaller distance be- 
tween inorganic nanowires, which can be dem- 
onstrated by the shift of peaks to higher degrees 
(Fig. 3A), as seen in the small-angle x-ray dif- 
fraction (25) spectrum. The distances between 
inorganic nanofillers affect the mobility of the 
polymeric chains, which can be evaluated by dif- 
ferential scanning calorimetry (26). The glass 
transition temperature (T,) of PVA strongly 
shifted toward the higher values with the con- 
centration of inorganic components increasing, 
and even disappeared for ATE when there are 
more inorganic components (Fig. 3B), which is 
attributed to suppression of the thermal motion 
of the polymer chains in the presence of more 


inorganic nanowires (27); this is schematically 
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Fig. 4. Deformation and failure modes in ATE. (A to ©) SEM images respectively, showing crack bridging and nanowires pulling up. Scale bar, 
taken from in situ three-point bending tests. (D) Schematic illustration of the  E, and Ez 1 um. (F and G) SEM images of the permanent deformation 
crack deflection of ATE during the three-point bending test. (E) Enlarged zone of ATE and enamel after nanoindentation, where the yellow arrows 
image taken from the red boxed area in (B). (Ey and Ez) Enlarged SEM refer to the crack deflection route and the orange arrows refer to the 
images taken from the yellow boxed area and the orange boxed area in (E), interface delamination. 
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depicted in Fig. 3C. When the inorganic phase 
content is small, the distance (d) between 
nanowires is larger than the critical distance, 
d,. = 2.88 nm (table S4), and the polymer 
chains become flexible, thus decreasing stiff- 
ness, hardness, and strength. With an increase 
in the inorganic phase, the mobility of the 
polymer is partially confined by the suitable 
distance (d = d,), providing sufficient sup- 
port and a strong interface connection which 
leads to a simultaneous increase in stiffness, 
hardness, and strength. As the concentration 
of nanowires increases, the distance between 
them becomes smaller than the critical dis- 
tance (d < d,), which creates a strong confine- 
ment for the polymer chains (the thermal 
mobility of the polymer chains almost dis- 
appears) and provides strong support for 
the composite (28). As a result, the stiffness 
and hardness of ATE-5 are the highest, and 
higher than those of tooth enamel. However, 
this partially sacrifices the polymer’s mobility 
and its ability to adapt to changing interfaces 
between the organic and inorganic phases, 
leading to a slight decrease in strength and 
toughness of ATE-5, which can also be observed 
in cellulose composites (29). Regardless, the 
comprehensive mechanical performance of 
ATE-5 is still outstanding for overall stiffness, 
hardness, strength, viscoelasticity, and tough- 
ness. FTIR analysis (fig. S29) of A-ZrO./PVA 
interfaces imply that there also exists chem- 
ical bonding between Zr** and -OH of PVA, 
and these strong chemical bonds strengthen 
the interface connection. Furthermore, con- 
sidering that both the HA nanowires and the 
PVA matrix are closely connected to the AIP 
through chemical bonding as illustrated in 
Fig. 3C, right, the AIP provides a buffer layer 
which can not only facilitate the stress transfer 
but also enhance the inorganic-organic inter- 
face connection, which efficiently contributes 
to the outstanding mechanical performance 
of ATE. 

The improved mechanical properties ob- 
served during bending tests can be attri- 
buted to fracture-resistant deformation and 
crack deflection (Fig. 4, A to D). Specifically, 
when an external load was applied to the 
sample, the nanowires initially slid, dissipat- 
ing a considerable amount of energy as a 
result of tight binding between the organic 
phase and the inorganic amorphous layer. 
Similar to other biomaterials and composites, 
the confinement of the organic phase in the 
gap between the nanowires maximized the 
contribution of interfaces but also restricted 
their motion, thus improving crack deflection 
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(Fig. 4E). Pull-out of the nanowires (Fig. 4, E, 
and Eg.) and fracturing of the sample with 
large-range crack deflection (Fig. 4C) dissipated 
a large amount of energy. Moreover, the pull 
out nanowires can connect to each other to 
restrict the further failure of the sample. Crack 
splitting, bridging, and bunching (Fig. 4E) 
also occurred during crack propagation, which 
can further dissipate energy and result in ATE’s 
excellent flexural toughness without sacrificing 
strength (30). In comparison, ATE-NAIP ex- 
hibited a relatively small crack deflection 
owing to the lack of amorphous restriction. 
Additionally, ATE-NOM exhibited almost brittle 
fracture (fig. S30). 

To investigate the role of the enamel-like 
hierarchical architecture on increased stiffness, 
hardness, and viscoelasticity, we observed the 
permanent deformation zone obtained by a 
maximum load of 200 mN from the top view. 
We attributed their outstanding mechanical 
performance to sliding nanowires and crystal- 
amorphous phase-facilitated energy dissipation. 
Upon closer observation of the nanoindentation 
zone undergoing permanent deformation, we 
found jagged nanoscale cracks growing along 
the indenter (Fig. 4F, yellow arrows) and in- 
terface delamination (Fig. 4F, orange arrows) 
generated by the sliding, bending, and fracture 
of the nanowires. This can dissipate energy 
by transferring it from one nanowire to the 
organic layer and the adjacent nanowire, 
thus avoiding collapse of the structure and 
enhancing the stiffness, hardness, and vis- 
coelasticity of ATEs simultaneously (37). Sim- 
ilar mechanical behavior is also detected in 
enamel (8) (Fig. 4G), which means that the 
complex structure of ATE with the three iden- 
tified structural elements engenders enamel’s 
mechanical performance. 

In summary, we have engineered a multi- 
scale assembly pathway to macroscale analogs 
of tooth enamel, revealing atomic, nanoscale, 
and microscale organization of inorganic nano- 
structures similar to the original biomaterial. 
The designed biomimetic composite retaining 
the structural complexity of the biological pro- 
totype combines high stiffness, hardness, strength, 
viscoelasticity, and toughness. 
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methane ultra-emitters 
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Methane emissions from oil and gas (O&G) production and transmission represent a considerable 
contribution to climate change. These emissions comprise sporadic releases of large amounts of 
methane during maintenance operations or equipment failures not accounted for in current inventory 
estimates. We collected and analyzed hundreds of very large releases from atmospheric methane 
images sampled by the TROPOspheric Monitoring Instrument (TROPOMI) between 2019 and 2020. 
Ultra-emitters are primarily detected over the largest O&G basins throughout the world. With a total 
contribution equivalent to 8 to 12% (~8 million metric tons of methane per year) of the global 

0&G production methane emissions, mitigation of ultra-emitters is largely achievable at low costs 
and would lead to robust net benefits in billions of US dollars for the six major O&G-producing 
countries when considering societal costs of methane. 


s the second most important contributor 

to global warming, methane (CH,) has 

continued to accumulate in the atmo- 

sphere at a rate of ~50 million metric 

tons (Mt) per year over the past two 
decades, primarily because of increases in 
agricultural activities, waste management, 
coal, and oil and gas (O&G) production (J, 2). 
Large discrepancies between atmospheric 
inversions, bottom-up inventories, and bio- 
geochemical models remain largely unexplained 
(1, 3-5). This complicates attribution of the 
recent global rise in atmospheric methane to 
an anthropogenic or biogenic source, a pos- 
sible decline in the atmospheric OH radical 
sink (6, 7), or changes in biogenic or anthro- 
pogenic sources (8). Evidence of a large under- 
estimation of fossil sources was suggested 
by arecent analysis of ‘*CH, isotopic ratios 
(9). Representing a quarter of anthropogenic 
emissions alone, emissions from O&G pro- 
duction activities have increased from 65 to 
80 Mt per year over the past 20 years (10). 
This rapid increase imperils the success of 
the Paris Agreement (77). Anthropogenic emis- 
sions trends are partly explained by the in- 
crease in shale gas production in the US, 
which will soon be followed by the develop- 
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ment of large, currently underexploited shale 
reserves in China, Africa, and South America 
(12). Although O&G emissions from national 
inventories have been widely underestimated 
by conventional reporting (73), airborne imag- 
ery surveys have confirmed the omnipresence 
of intermittent emissions, distributed accord- 
ing to a power law (/4-16) with a righthand 
tail resulting from very large O&G emissions, 
often referred to as super-emitters (17) (top 1% 
of emitters or >25 kg/hour) (J8). 

Until recently, observation-based CH,, emis- 
sion quantification efforts were restricted 
regionally to short-duration aircraft surveys 
(lasting a few weeks) (19) or the deployment 
of in situ sensor networks (20, 21). Global 
efforts were limited by sparse sampling of 
coarse-resolution CH, column retrievals, such 
as the GOSAT mission (22). More routine and 
higher spatially resolved emission quantifica- 
tion was made possible by the European Space 
Agency Sentinel 5-P satellite mission, which 
carried the TROPOspheric Monitoring In- 
strument (TROPOMI; launched 2018) (23). 
TROPOMI samples daily CH, column mole 
fractions over the whole globe at moderate 
resolutions (5.5 km by 7 km?) and has revealed 
multiple individual cases of unintended very 
large leaks (24) and regional basin-wide anom- 
alies (25, 26). We systematically examine this 
dataset over multiple locations worldwide, 
which allows us to statistically characterize 
visible ultra-emitters (>25 tons/hour) of CH, 
from O&G activities across various basins. 
By nature, reducing these ultra-emitters by 
enforcing leak detection and repair strat- 
egies or by reducing venting during routine 
maintenance and repairs provides an action- 
able and cost-efficient solution for emission 
abatement (27). 

Detection of atmospheric column CH, en- 
hancements from single point sources is limited 
by TROPOMI instrument sensitivity [5 to 10 
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parts per billion (ppb)] (28), by the overlap of 
multiple plumes from closely located natural 
gas facilities (e.g., in the Permian basin), and by 
complex spatial gradients from remote sources 
that affect background conditions (supple- 
mentary materials). Rapidly varying meteoro- 
logical conditions require sufficiently robust 
approaches, especially with curved CH, plume 
structures for which common mass balance 
methods are too simplistic (29). We addressed 
this problem by applying an automated plume 
detection algorithm and quantified the asso- 
ciated emissions using the Lagrangian particle 
model HYSPLIT (30) driven by meteorological 
reanalysis products for each detected plume 
enhancement (>25 ppb averaged over sev- 
eral pixels; supplementary materials) over the 
whole globe. The detection threshold was 
adjusted to exclusively capture statistically 
significant enhancements against highly var- 
iable backgrounds (supplementary materials). 
Finally, we estimated the potential reductions 
along with abatement costs for various coun- 
tries, to determine effective gains at national 
levels. 

The number of detections of large total col- 
umn CH, mole fraction enhancements around 
the world, each associated with an ultra-emitter, 
totals >1800 single observed anomalies over 
2 years (2019-2020); a large fraction of them 
are located over Russia, Turkmenistan, the US 
(excluding the Permian basin where regional 
enhancements comprise many small to medium 
emitters), the Middle East, and Algeria (Fig. 1). 
Detections vary in magnitude and number 
(between 50 and 150 per month), most of them 
corresponding to O&G production or trans- 
mission facilities (about two-thirds of detec- 
tions, or ~1200), whereas ultra-emitters from 
coal, agriculture, and waste management rep- 
resent only a relatively small fraction (33%) 
of total detections (supplementary mate- 
rials). Ultra-emitters attributed to O&G infra- 
structure appear along major transmission 
pipelines and over most of the largest O&G 
basins, representing more than 50% of total 
onshore natural gas production worldwide 
(10). Offshore emissions remain invisible to 
TROPOMI, and cloud cover almost entirely 
blocks O&G basins in tropical areas; hence, 
these are excluded from our analysis (supple- 
mentary materials). 

Estimated emissions from O&G ultra-emitters 
rank highest for Turkmenistan with 1.3 Mt 
of CH, per year, followed by Russia, the US 
(excluding the Permian basin), Iran, Kazakhstan, 
and Algeria (Fig. 2A). Because leak duration 
varies and S5-P provides only snapshots, each 
leak duration was determined either on the 
basis of an observed duration deduced from 
the plume length (advection time) or setting a 
24-hour duration when consecutive images can 
confirm the presence of the same anomaly over 
multiple days (Fig. 2A). Leaks lasting several days 
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Fig. 1. Global map of ~1200 O&G detections from TROPOMI between 2019 and 2020 (upper panel), zoomed in over Russia and Central Asia (lower left panel), 
including the main gas pipeline (dark gray) and an example of a detected plume over northern Africa (Hassi Messaoud; lower right panel). Circles are scaled 


according to the magnitude of the ultra-emitters. Undetermined sources are indicated in blue. [Map: MapBox] 
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were adjusted according to lack of coverage 
and hence quantized to 24 hours (supplemen- 
tary materials). Two additional scenarios—based 
on (i) a systematic 24-hour duration and (ii) 
based on the length of the observed plumes— 
were constructed to define the upper and lower 
bounds of durations (supplementary materials). 
The lack of coverage due to clouds, albedo, or 
aerosols was quantified by adjusting for the 
number of observed days compared with the 
full period length (supplementary materials). 
Uncertainties were quantified by a negative 
binomial probability function (37) (supple- 
mentary materials). We illustrate this ad- 
justment in Fig. 2A, which is large for some 
countries (e.g., Russia), by subsampling the 
coverage over Turkmenistan (originally 118 
detected ultra-emitters) with the lowest cov- 
erage observed over Iran (22). After adjust- 
ment, estimated emissions fall within 2% 
of the original estimate, and estimated un- 
certainty (1.26 MtCH,) matches the full sta- 
tistical test on the interval 0.96 to 1.6 MtCH, 
(fig. S10). On the basis of adjusted emissions, 
O&G ultra-emitter estimates represent 8 to 
12% of global O&G CH, emissions (according 
to national inventories; Fig. 2C), a contribu- 
tion not included in most current invento- 
ries (13). 

As one of the largest natural gas reserves 
in the world [~20 trillion m?, ranking fourth in 
the world according to the International Energy 
Agency (IEA) (J0)], Turkmenistan is likely to 


Fig. 2. Country-level emissions from 
O&G ultra-emitters between 2019 

and 2020 observed and estimated 
(adjusted for leak duration and lack of 
coverage), together with two extreme 
leak duration scenarios: (A) relative 
size of the estimated ultra-emitters to 
two national scale methane inventories, 
EDGAR 5.0 and EPA; (€) distribution 
of super-emitters and ultra-emitters 
from airborne visible-infrared imaging 
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see its O&G CH, emissions double from in- 
ventories estimates based on mean emissions, 
as its ultra-emitters are not accounted for by 
current inventory calculation methods (Fig 2C). 
Ultra-emitters are also relatively common and 
particularly large in Russia, Iran, and Kazakhstan, 
representing between 10 and 20% of annual 
reported emissions. The US was found to have 
fewer ultra-emitters (5% of annual inventory 
emissions), but we excluded the Permian basin 
(~10% of US natural gas production) as a result 
of the large, basin-wide XCH, enhancement 
which obscures single detections (32). A recent 
study estimated the O&G CH, emissions from 
the Permian basin at 2.7 Mt per year using 
TROPOMI (33), which represents 35% of US 
O&G production emissions from the top-down 
estimate for the entire US (13). Because of 
the higher density of flaring equipment in the 
Permian basin, we assume that the proportion 
of ultra-emitters over the US (excluding the 
Permian basin) represents a lower bound at the 
country scale. Middle Eastern countries such 
as Iraq or Kuwait have even fewer detections 
(31 detected ultra-emitters) possibly because 
of fewer accidental releases and/or more 
stringent maintenance operations. The detec- 
tion limit of ultra-emitters is around 25 tons 
of CH4 per hour, whereas the largest events 
reach several hundred tons per hour with as- 
sociated plumes spanning hundreds of kilo- 
meters. Countries such as Kuwait, Iraq, and 
Saudi Arabia—all major gas producers—have 
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few ultra-emitters despite clear sky conditions 
and homogeneous albedo. However, ultra- 
emitters from oil and gas basins throughout 
the world unequivocally follow a power-law 
distribution (Fig. 2B), which implies that if 
the power-law coefficients are well defined, 
ultra-emitters should scale directly with smaller 
emitters. To establish this relationship over 
a broader range of emissions, the power-law 
of smaller emitters (from 0.1 to 10 tons of 
CH4 per hour) observed in high-resolution 
airborne imaging spectrometer surveys of 
California (75) and the Permian basin (16) was 
combined with the S5-P-derived power-law for 
ultra-emitters alone, revealing similar regres- 
sion parameters (slope 1.9 to 2.3; Fig. 2B). The 
actual number of ultra-emitters varies by 
country (Fig. 2D) but the relationship between 
the number of events and their magnitudes 
remains similar, in the range of 0.1 to 300 tons 
of CH4 per hour over two gas basins in the US. 
Very small leaks (<100 kg of CH, per hour)— 
mostly caused by nominal operations (i.e., 
pneumatic devices)—might fall within a dif- 
ferent relationship (34), whereas larger leaks 
are mostly accidental or related to specific 
maintenance operations (35). Overall, the total 
fraction of CH, emissions from ultra-emitters 
remains difficult to quantify accurately owing 
to the lack of observations of smaller emitters, 
but their relative contribution compared with 
known sources is nonnegligible and thus offers 
a cost-efficient and actionable opportunity to 
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reduce CH, emissions, whereas natural gas 
production increases steadily by ~3% per 
year (10). 

We evaluate the industry spending required 
to eliminate ultra-emitter-based methane emis- 
sions based on analyses of mitigation costs re- 
cently produced by several groups: the IEA (/0), 
the US Environmental Protection Agency (US 
EPA) (36), and the International Institute for 
Applied Systems Analysis (IIASA) (37). All 
costs are evaluated in 2018 US dollars per ton 
of methane. Briefly, we first analyze marginal 
abatement cost curves developed by these 
groups at the national level (regional level 
for IIASA), excluding valuation of environ- 
mental impacts. Because large emissions are 
expected to be related to upstream opera- 
tions or long-distance transport of fuels, we 
exclude local distribution networks from the 
IIASA analysis as it separates those sources. 
The IEA analysis provides separate cost esti- 
mates for high emission sources, whereas 
the EPA and IEA do not. However, methane 
emissions from ultra-emitters are expected 
to be more cost-effective to mitigate than 
average sources, and IEA estimates for our 
six countries of interest show costs of ~$110 
to $300 per ton less than the average cost of 
mitigation in the O&G sector in those coun- 
tries. We therefore evaluate average mitiga- 
tion costs within the O&G sector for EPA and 
IIASA analyses, screening for the subset of 
measures costing <$600 per ton. This same 
threshold was recently used to define “low- 
cost” controls (38) and would correspond 
to ~$20 per ton of carbon dioxide equivalent 
if converted using the Intergovernmental 
Panel on Climate Change Sixth Assessment 
Report’s GWP100 value of 29.8 for fossil meth- 
ane. Averaged across these mitigation analy- 
ses, spending is net positive in Iran (~$60 per 
ton) but is net negative in all other high-emission 
countries with net savings ~$100 to $150 per 
ton in Russia, Kazakhstan, and Turkmenistan, 
~$250 per ton in the US, and $400 per ton in 
Algeria, though values vary greatly across avail- 
able analyses (Fig. 3A). 

Examining the total spending required to 
eliminate the high-emission sources in each 
country, there is a large spread across the 
available analyses: Iran has the largest average 
expenditure ($16 million), but values range from 
$30 million to $95 million throughout the 
analyses. Results for the US are more robust 
in that all show net savings, but the values 
still vary markedly ranging from $19 million 
to $217 million. The IIASA values are the most 
favorable (lowest) in five of the six countries 
but the least favorable in Iran (though IIASA 
provides averages across the Middle East, which 
may affect that result). IEA values are typically 
the least favorable, with the US EPA values in 
the middle, except for Russia and Kazakhstan 
where the EPA values are the highest. Aver- 
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aged across the three analyses, the largest total 
benefits (a function of costs and emissions 
magnitude) appear to lie in Turkmenistan, 
with net savings of ~$200 million, followed by 
Russia and the US, with net savings of ~$100 
million each. 

We also evaluate societal costs when account- 
ing for monetized environmental impacts. We 
incorporate the recently described valuation 
from the Global Methane Assessment (38), 
which assigns a value of $4400 per ton of 
methane, accounting for the manifold impacts 
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of methane on climate and surface ozone, both 
of which affect human health (mortality and 
morbidity), labor productivity, crop yields, and 
other climate-related impacts. In addition to 
these effects, controlling high emitters in the 
six highlighted countries leads to robust net 
benefits of ~$6 billion for mitigation for Turkme- 
nistan, ~$4 billion for Russia, ~$1.6 billion for 
the US, ~$1.2 billion for Iran, and ~$400 mil- 
lion each for mitigation in Kazakhstan and 
Algeria. The range across the three mitigation 
cost analyses is small in this case— ~10% 
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(Fig. 3B). Our valuation of societal costs is much 
larger than current European Union emissions 
prices using GWP100 (~$1130 per ton) because 
we include effects related to air pollution 
and ~50% larger than values using GWP20 
(~$2770 per ton). 

In terms of net climate benefits, eliminating 
methane emissions from ultra-emitters would 
lead to 0.005° + 0.002°C of avoided warming 
over the next one to three decades on the 
basis of linearized estimates from prior mod- 
eling (38). Though small, this value is ap- 
proximately equal to the total influence from 
all emissions since 2005 from Australia or 
the Netherlands (39), or removal of 20 million 
vehicles from the road for 1 year. The avoided 
warming would prevent ~1600+800 pre- 
mature deaths annually due to heat expo- 
sure and ~1.3+0.9 billion hours of labor 
productivity lost annually due to exposure to 
heat and humidity, with the latter valued at 
~$200 million per year. 

On the basis of the power-law distribution 
of emitters, we derived a detection threshold 
of 25 tons of CH4 per hour, in agreement with 
previous estimates (40) using a cross-sectional 
flux approach to estimate the leakage rates 
of a major leak in Turkmenistan. For lower 
emission rates, the number of emitters invi- 
sible to TROPOMI far surpasses visible ultra- 
emitters, as suggested by airborne surveys 
over oil and gas production basins in California, 
the Four Corners region, and the Permian basin 
(14, 15, 41). High-resolution satellite imagery 
from Sentinel-2 (42) or from PRISMA and 
GHGSat (41) depict turbulent XCH, plume 
structures enabling facility attribution and 
quantification of leaks above 5 tons of CH, 
per hour. These imagers offer limited cover- 
age (tasking mode or along-track scanning 
over small regions), which suggests that com- 
bined use with TROPOMI is necessary to 
achieve monitoring needs. Additional satellite 
instruments are planned to be launched in the 
near future (e.g., EnMAP, Carbon Mapper, 
SBG, CHIME, EMIT) offering high-resolution 
images (30 to 60 m resolution) or MethaneSAT 
(43) (130 by 400 m resolution) over selected 
high-priority areas, precursors to full constel- 
lations of imagers covering the globe daily. 
Until then, and given the robust power-law 
distribution of CH, ultra-emitters, the link 
between intermittent high-resolution imag- 
ery and regular low-resolution images from 
TROPOMI can help fill gaps in coverage. 
ImproLaved attribution of methane to spe- 
cific facilities or operations remains critical 
to support the development of robust na- 
tional emissions inventory as defined by 
the United Nations Framework Convention 
on Climate Change to inform oil and gas op- 
erators of accidental releases and to help 
regulators assess progress toward CH, emis- 
sion targets. 
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Nocturnal survival of isoprene linked to formation of 
upper tropospheric organic aerosol 


Paul |. Palmer’2*+, Margaret R. Marvin’+, Richard Siddans°“, Brian J. Kerridge**, David P. Moore®® 


Isoprene is emitted mainly by terrestrial vegetation and is the dominant volatile organic compound 
(VOC) in Earth’s atmosphere. It plays key roles in determining the oxidizing capacity of the troposphere 
and the formation of organic aerosol. Daytime infrared satellite observations of isoprene reported 

here broadly agree with emission inventories, but we found substantial differences in the locations 
and magnitudes of isoprene hotspots, consistent with a recent study. The corresponding nighttime 
infrared observations reveal unexpected hotspots over tropical South America, the Congo basin, 

and Southeast Asia. We used an atmospheric chemistry model to link these nighttime isoprene 
measurements to low-NO, regions with high biogenic VOC emissions; at sunrise the remaining isoprene 
can lead to the production of epoxydiols and subsequently to the widespread seasonal production of 


organic aerosol in the tropical upper troposphere. 


he main source of atmospheric isoprene 
[CH,=C(CH3)-CH=CHg] is terrestrial 
plants, including some mosses, ferns, 
gymnosperms, and angiosperms (7). Iso- 
prene is emitted from leaves, with the 
emission rate dependent mostly on temper- 
ature and photosynthetically active radia- 
tion (PAR). Only at chronic levels of water 
stress and temperature do isoprene emissions 
cease; even after this stress is alleviated, the 
emissions resume, sometimes at rates higher 
than before the stress began (2). The main loss 
of atmospheric isoprene is oxidation by the 
hydroxyl radical (OH), resulting in a typical 
lifetime of 1 hour, which varies according to 
the photochemical environment. The impor- 


tance of isoprene emissions lies in the resulting 
atmospheric chemistry. Inventories estimate 
the global annual mean flux of isoprene to be 
~500 Tg C (3) but with a large uncertainty that 
reflects individual model assumptions (4) and 
the sparsity of measurements that underpin 
inventories. Tropical ecosystems represent 
~80% of this global total (3), although esti- 
mates inferred from satellite observations sug- 
gest that this fraction might be overestimated 
by 30% (5). The fate of isoprene oxidation 
products depends on the relative abundances 
of nitrogen oxides (NO, = NO + NO.) and 
hydroperoxy (and organic peroxy) radicals. 
Broadly, at comparatively high NO, levels, 
the chemistry results in the rapid production 


of formaldehyde (HCHO) that is now routinely 
observed by Earth-observing satellites and can 
be used to infer the emissions of the parent 
hydrocarbons, predominantly isoprene (6-8). 
At comparatively low levels of NO,, isoprene 
peroxy radicals react with hydroperoxy radi- 
cals to form an organic peroxide, and to a lesser 
extent they can undergo unimolecular isom- 
erization reactions. The peroxides can then 
be further oxidized by OH (lifetime of ~3 to 
5 hours) to form isoprene epoxydiols (IEPOX) 
(9). The atmospheric lifetime of IEPOX against 
oxidation by OH is ~20 to 30 hours, during 
which time it can partition into the particle 
phase, either by condensation or reactive 
uptake by preexisting particles, to form IEPOX 
secondary organic aerosol (SOA) (10-13). 
Recent studies have reported isoprene re- 
trievals from highly spectrally resolved infrared 
CR) data from the Cross-track Infrared Sounder 
(CrIS) aboard the NASA/NOAA Suomi-NPP 
satellite (14); these data have been used to 
revise isoprene emission inventories (15). 
The satellite was launched in 2011 into a Sun- 
synchronous orbit with equator-crossing local 
times of 0130 and 1330. CrIS measures IR 
radiation in three bands that span 650 to 
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the time and location of each CriIS measurement and is convolved with the 


(local equatorial overpass time 0130) effective isoprene columns. 

(A and B) April 2018. (€ and D) July 2018. (E and F) September 2018. 

(G and H) December 2018. The magnitudes of these effective columns depend 
on how the isoprene vertical profile is represented in the retrieval and on 
scene-dependent averaging kernels (16). The GEOS-Chem model is sampled at 
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scene-dependent averaging kernel. Satellite data and model output are filtered to 
exclude scenes where co-retrieved aerosol optical thickness (AQT) > 0.05 and 
effective cloud fraction (= cloud fraction x cloud top height) > 4 km. Black 
squares over tropical South America and tropical Africa denote regions used in 
Figs. 2 and 3 and the corresponding plots over tropical Africa (16). 
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2550 cm | at a spectral resolution of 1.25cm™ | provides a full error characterization of thea | mixing ratio to be height-invariant, so that the 
(16). Here, we report the results from an in- | posteriori solution. We retrieve isoprene col- | geographical distribution of retrieved columns 
dependent, physically based CrIS isoprene | umns from spectral features in the range 890 | is exclusively due to measurement informa- 
retrieval scheme (17) that uses an optimal | to 908 cm’ sampling 21CrIS channels. The | tion, in contrast to (14). The vertical sensi- 
estimation approach (16), which naturally | retrieval scheme assumes the isoprene volume | tivity needed for quantitative interpretation 
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Fig. 2. GEOS-Chem model altitude-time cross sections of isoprene, NO, and some isoprene gas-phase and organic aerosol oxidation products over tropical 
South America during April 2018. The region is defined as 6° to 12°S and 65° to 70°W and is denoted by the smaller rectangle in Fig. 1. (A) isoprene (ppbv). 
(B) NO (ppbv). (C) Production rate of IEPOX (molec cm™ s~}). (D) IEPOX (ppbv). (E) Production rate of IEPOX SOA (molec cm? s~4). (F) IEPOX-SOA (ug m°° at 
standard conditions of 273 K and 1013 hPa unless otherwise stated). Time is expressed in days at Coordinated Universal Time (UTC). The solid black lines denote the 
GEOS 3-hourly boundary layer height. 
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of “effective” columns is provided in the form 
of scene-dependent averaging kernels (6). 
We describe daytime and nighttime measure- 
ments during 2018, taking advantage of the 
nature of IR measurements that do not require 
sunlight, and focus on nighttime measure- 
ments that have not previously been reported. 
Our daytime measurements are broadly con- 
sistent with recent studies (14-16). We also 
use HCHO and nitrogen dioxide (NO.) col- 
umn data from the Tropospheric Monitor- 
ing Instrument (TROPOMI) aboard the 
Copernicus Sentinel-5 Precursor, in orbit with 
the same equator-crossing time as CrIS, to help 
interpret the CrIS daytime isoprene column 
data (16). To interpret these column data, we 
use the GEOS-Chem global 3D model of at- 
mospheric chemistry that includes a detailed 
description of gas- and particle-phase chem- 
istry associated with isoprene (16, 18). To com- 
pare against CrIS, the model is sampled at the 
time and location of each measurement and 
convolved with scene-dependent averaging 
kernels to account for the vertical sensitivity 
of the retrieval (16). We evaluate our model 
results using measurements of total OA from 
the Atmospheric Tomography Mission (ATom, 
https://espoarchive.nasa.gov/archive/browse/ 
atom/id14/DC8) collected downwind of our 
study region (16, 19), representing the total 
amount of relevant aircraft data available 
during our study period. 

Figure 1 shows monthly distributions of 
CrIS and GEOS-Chem nighttime isoprene data 
docal equatorial overpass time of 0130) during 
April, July, September, and December 2018. 
We apply inverse variances from the gridded 
isoprene columns to calculate these weighted 
monthly means, which helps to reduce noise 
in the aggregated CrIS measurements. Elevated 
values are consistently found over northern 
tropical South America (the northern half of 
Brazil, Colombia, Venezuela), central Africa 
(Congo basin, Angola), and Southeast Asia 
(northern Sumatra, the island of Borneo, Papua 
New Guinea). We find broad agreement be- 
tween CrIS and GEOS-Chem in the location 
and magnitude of the elevated isoprene during 
nighttime, when it is determined primarily 
by atmospheric transport that helps to mix 
the atmospheric signals from low- and high- 
emitting daytime regions. Consequently, iso- 
prene observed by CrIS at 0130 represents 
more of a homogeneous distribution than we 
find during the daytime, when columns are 
also strongly influenced by photochemistry 
(16). Our analysis of nighttime columns pro- 
vides confidence in the model to interpret the 
data, although there remain differences that 
mainly reflect errors in emission inventories 
(16). Here, we focus on the origin and evolu- 
tion of nighttime isoprene over tropical South 
America, with complementary analysis over 
tropical Africa, particularly over the Congo 
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Basin (6). The coarse model spatial resolution 
(2° x 2.5°) precludes any meaningful compar- 
ison over maritime Southeast Asia associated 
with sub-grid-scale features. 

Figure 2 shows GEOS-Chem model time- 
altitude plots for a region over tropical South 
America during April 2018 where nighttime 
isoprene is elevated (Fig. 2A). Isoprene emis- 


sions (not shown) have a strong diurnal pat- 
tern with values peaking in early afternoon, 
coinciding with peak values of PAR and leaf 
temperatures, and absent during the night 
when PAR is zero. Loss of atmospheric iso- 
prene also peaks during the day, reflecting 
peak production rates of OH that are correlated 
with sun elevation angle. Current knowledge 
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Fig. 3. Monthly GEOS-Chem model vertical profiles of organic aerosol over tropical South America. 
The region is defined as 6° to 12°S and 65° to 70°W and is denoted by the smaller rectangle in Fig. 1. 

(A and B) IEPOX-SOA (ug std m~%) (A) and total OA (ug std m™) (B) during April, July, September, and 
December 2018. (€ and D) Constituents of total OA during April 2018 (C) and September 2018 (D). ISOA, 
TSOA, and ASOA denote SOA from isoprene, higher terpenes, and aromatic VOCs, respectively; POA and OPOA 
denote primary OA and oxidized POA. Inset plots zoom in on total OA at altitudes between 6 and 14 km. 


3 of 5 


RESEARCH | REPORT 


suggests that atmospheric isoprene is quickly 
consumed by OH (with a lifetime of ~1 hour). 
In pristine environments, where nitrogen 
oxide levels are low, the photo-oxidation of 
isoprene can lead to the suppression of OH 
(20). We find in the model that there re- 
mains substantial atmospheric isoprene dur- 
ing nighttime [>1 ppbv (parts per billion by 
volume); Fig. 2A], not just in the shallow 
boundary layer (27) but also in the mid- to 
upper free troposphere. Temporal and spatial 
variations in nighttime free tropospheric iso- 
prene are correlated with conditions at sun- 
set of the previous day when convective mass 
fluxes remain elevated and OH levels are 
beginning to decline (/6). The lifetime of 
isoprene during nighttime, determined pri- 
marily by ozone and nitrate oxidation, is 
much longer than during the daytime when 
it is determined primarily by OH oxidation. 
Over South America, the resulting free tropo- 
spheric production of IEPOX precursors and 
IEPOX begins at sunrise of the next day. Pe- 
riods of elevated NO found in the upper tropo- 
sphere (Fig. 2B) suppress the production of 
IEPOX (Fig. 2C), as expected. In the model, 
we find that this NO is mostly due to light- 
ning (6). As we show later, this helps to ex- 
plain why the origin of IEPOX SOA does not 


necessarily coincide with the highest emis- 
sions of isoprene. Free tropospheric levels of 
gas-phase (Fig. 2D) and particle-phase IEPOX 
(Fig. 2F) closely follow their production rates 
(Fig. 2, C and E). 

Figure 3A shows the GEOS-Chem model 
mean vertical distributions of IEPOX-SOA over 
tropical South America (Fig. 3B, 6° to 12°S, 65° 
to 70°W) during April, July, September, and 
December 2018. Values are generally higher 
near the surface during July and September, 
reflecting higher isoprene emissions (6). Upper 
free tropospheric IEPOX-SOA levels are higher 
in April and December as a result of larger 
convective fluxes. Lumped SOA from IEPOX 
and other low-volatility isoprene oxidation 
products represents 8 to 34% of mean total 
OA below 6 km, and 22 to 62% above 6 km, 
during April and December (Fig. 3B). During 
April, most of the OA in the lowest 6 km is due 
to SOA from terpene oxidation (Fig. 3C)—an 
expected result based on the parameteriza- 
tion we use to describe this source of SOA 
(16) with a small contribution from oxygen- 
ated primary OA (OPOA). In the upper tropo- 
sphere, total OA is dominated by isoprene but 
with a 32% contribution from higher-order 
terpenes. In contrast, during September (Fig. 


3D) when there is substantial biomass burn- 


ing, OPOA and anthropogenic SOA play a larger 
fractional and absolute role in OA throughout 
the tropospheric profile. 

Figure 4 shows a comparison between mean 
aircraft profile measurements of total OA 
(Fig. 4A) from ATom-4 (16) and GEOS-Chem 
model values (Fig. 4B) off the coast of tropical 
South America during May 2018. [See (6) for 
model evaluation using other campaign data.] 
For ATom-4, we find that the mean model 
profile agrees within one standard deviation 
of the mean measured profile, except at alti- 
tudes of >12 km (representing only nine 1-min 
data points), where the model underestimates 
the measurements by 0.29 pg std m™®? (62%). 
The best agreement is achieved at 10 to 12 km 
(two altitude bins each containing ~50 data 
points), where mean model OA matches the 
measurements within 15%. To understand the 
influence of nighttime isoprene on these mea- 
surements, in a sensitivity run, we set night- 
time values (determined by Sun elevation 
angle) to zero so that they cannot be lofted to 
the free troposphere and subsequently influ- 
ence the chemistry of organic aerosol. We find 
that nighttime isoprene plays only a small role 
(<0.035 ug std m™) in the lower-troposphere 
measurements but rises to as much as 0.098 ug 
std m™ (63%) in the mid- and upper troposphere, 
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as also indicated by the profiles along this 
flight track (Fig. 4D). Using the GEOS-Chem 
model as an intermediary, we find that the 
nighttime detection of isoprene by CrIS over 
tropical South America is consistent with down- 
wind levels of OA observed by aircraft. 
Previous studies based on ensemble ana- 
lyses of aircraft observations have noted that 
chemistry models are missing a substantial 
source of free tropospheric OA (22, 23). In- 
creasing the anthropogenic component of OA 
helps to explain the measurement-model gap 
near source regions (24) but fails to reconcile 
measurements in the remote atmosphere (23). 
Similarly, improved knowledge of atmospheric 
chemistry only partly addresses the model bias 
(25). Our analysis suggests that over the tropics 
more attention should be given to the mag- 
nitude and distribution of isoprene and the 
chemical and physical processes that deter- 
mine its transport to the free troposphere at 
sunset, in addition to studying the atmospheric 
fate of isoprene oxidation products. By virtue 
of its atmospheric lifetime, IEPOX-SOA can be 
transported across the tropical upper tropo- 
sphere, where it represents about one-third of 
total OA (6), eventually subsiding to provide a 
large seasonal supply of cloud condensation 
nuclei in the lower troposphere (26-29). 
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GENE REGULATION 


Genome organization controls transcriptional 
dynamics during development 


Philippe J. Batut*, Xin Yang Bing, Zachary Sisco, Jodo Raimundo, Michal Levo, Michael S. Levine* 


Past studies offer contradictory claims for the role of genome organization in the regulation of gene 
activity. Here, we show through high-resolution chromosome conformation analysis that the Drosophila 
genome is organized by two independent classes of regulatory sequences, tethering elements and 
insulators. Quantitative live imaging and targeted genome editing demonstrate that this two-tiered 
organization is critical for the precise temporal dynamics of Hox gene transcription during development. 
Tethering elements mediate long-range enhancer-promoter interactions and foster fast activation 
kinetics. Conversely, the boundaries of topologically associating domains (TADs) prevent spurious 
interactions with enhancers and silencers located in neighboring TADs. These two levels of genome 
organization operate independently of one another to ensure precision of transcriptional dynamics and 


the reliability of complex patterning processes. 


enome organization is emerging as a 

potentially important facet of gene 

regulation (7-5). Because transcriptional 

enhancers often reside far from their 

target promoters, chromatin folding may 
guide the timely and specific establishment of 
regulatory interactions (J, 3, 4, 6-10). Although 
long-range enhancer-promoter contacts are 
prevalent, it remains unclear whether they 
actually determine transcriptional activity 
(9, 11). Boundary elements partition chromo- 
somes into topologically associating domains 
(TADs) (7, 12), whose importance for gene 
regulation remains controversial (8, 13-16). 
There is also an unresolved dichotomy between 
elements that promote and prevent enhancer- 
promoter interactions, because CTCF binding 
sites have been implicated in both (7, 9, 17). We 
show here that distinct classes of regulatory 
elements mediate these opposing functions 
genome-wide: Dedicated tethering elements 
foster appropriate enhancer-promoter inter- 
actions and are key to fast activation kinetics, 
whereas insulators prevent spurious inter- 
actions and regulatory interference between 
neighboring TADs. 

We characterized genome organization at 
single-nucleosome resolution in developing 
Drosophila embryos using Micro-C (78). We 
focused on the critical ~60-min period preced- 
ing gastrulation, when the fate map of the 
embryo is established by localized transcription 
of a cascade of patterning genes, culminat- 
ing with the Hox genes that specify segment 
identity. Analysis of the Antennapedia gene 
complex (ANT-C), one of two Hox gene clusters 
and an archetype of regulatory precision, reveals 
an intricate hierarchical organization. Insulators 
partition the locus into a series of TADs, whereas 
tethering elements mediate specific intra- 
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TAD focal contacts between promoters of Scr 
and Antp and their distal regulatory regions 
(Fig. 1A and fig. S1). 

The entire genome is similarly organized by 
2034 insulators and 620 tethering elements. 
Insulators and tethers display notably little 
physical overlap (Fig. 1B) and have sharply 
contrasting chromatin signatures (Fig. 1C; 
fig. S2, A to C; and table S1). Insulators are 
characterized by H3 lysine 4 trimethylation 
(H3K4me3) and the binding of canonical 
insulator proteins (CTCF, CP190), whereas 
tethers are distinguished by H3K4 mono- 
methylation (H3K4mel) and the binding of 
pioneer factors Trithorax-like (Trl), grainyhead 
(grh), and zelda (zld; fig. S2, A and B). There 
are 103 focal contacts (33%) that connect 
promoters of protein-coding genes to “orphan” 
intergenic sequences, which we term distal 
tethering elements (DTEs; Fig. 1D); others 
connect different genes together. These con- 
tacts typically span tens of kilobases (mean 
43.5 kb; Fig. 1D and table S2) and are observed 
at many critical developmental loci, includ- 
ing vestigial and cut (fig. S3). Because DTEs 
generally display no enhancer activity in the 
early embryo (Fig. 1E and fig. S2D), we hy- 
pothesized that they might be organizational 
elements dedicated to fostering long-range 
enhancer-promoter interactions. In contrast to 
enhancers, DTEs retain an “open” chromatin 
conformation throughout embryogenesis (fig. 
S2E), consistent with evidence that focal con- 
tacts are stable across cell types (19) and de- 
velopmental stages (1). To explore their 
potential roles in transcriptional regulation, 
we systematically disrupted tethers and insu- 
lators throughout the ANT-C (tables S3 and 
S4) and leveraged quantitative live-imaging 
methods to measure changes in the tran- 
scriptional dynamics of Hox genes in devel- 
oping embryos (Fig. 1F). 

The Sex combs reduced (Scr) gene, con- 
tained within a 90-kb TAD, is regulated by 
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an early embryonic enhancer (Scr EE) lo- 
cated 35 kb upstream of the promoter [figs. 
S4 and S5; (20)]. This enhancer bypasses an 
intervening TAD that contains ftz—a highly 
expressed pair-rule gene—to selectively reg- 
ulate Scr transcription. A DTE situated 6 kb 
upstream of the enhancer anchors a focal con- 
tact with a promoter-proximal tether (Fig. 2A). 
These tethering elements correspond to se- 
quences previously shown by reporter assays 
to modulate enhancer-promoter selectivity 
(21, 22). The DTE lacks any intrinsic enhancer 
activity (fig. S4), suggesting a specific role 
in fostering long-range enhancer-promoter 
interactions. 

A targeted deletion of the DTE completely 
abolishes this focal contact and diminishes 
interactions between the EE enhancer and 
the Scr promoter (Fig. 2A and figs. S6 and 
87). Single-cell transcription measurements 
in living embryos reveal a marked delay in the 
dynamics of Scr activation across the cells of 
the prospective stripe (Fig. 2B). Transcription 
levels in nuclei that become active appear 
unaffected, and the mutant allele ultimately 
reaches a regime of activity indistinguishable 
from that of the wild type. Overall, the mutant 
allele is active in the appropriate spatial do- 
main, but its transcriptional output is sub- 
stantially reduced owing to the delayed onset 
of expression (Fig. 2B and fig. S6A). Deletion of 
the EE enhancer reduces Scr transcription but 
does not disrupt the focal contact—it may even 
be somewhat strengthened (Fig. 2A and fig. 
S8). These observations suggest that promoter- 
DTE focal contacts are autonomous features of 
the regulatory genome. Disruptions of focal 
contacts have strictly gene-specific effects: De- 
letion of the Scr DTE has no impact on the 
structure or transcription of the neighboring 
Dfd \ocus (fig. S6, C to E). 

Similarly, the Antennapedia (Antp) P1 early 
enhancer is associated with a DTE directly 
adjacent to it, which forms a focal interaction 
with a tethering element near the P1 promoter, 
38 kb away. Upon deletion of the DTE, the 
focal interaction is lost, and enhancer-promoter 
interactions are disrupted (figs. S6 and $7). 
Antp activation is substantially delayed but 
transcription levels in active nuclei are nor- 
mal, and transcription appears to fully recover 
after this initial lag (fig. S9). 

These observations show that DTEs specif- 
ically determine the dynamics of transcrip- 
tional activation in development. This temporal 
precision may be critical for the programming 
of cellular identities within stringent develop- 
mental windows. We propose that tethering 
elements foster physical interactions between 
promoters and remote enhancers to prime 
genes for rapid activation; they may also 
modulate other aspects of enhancer-promoter 
communication through interactions with core 
transcription complexes. 
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In addition to fostering preferential associ- 
ations with target promoters, DTEs also suppress 
“backward” interactions of associated enhancers 
with distal regions of their TADs (Fig. 2A and 
fig. S7). Both effects probably synergize to in- 
crease the specificity of enhancer-promoter 
communication. Although DTE deletions have 


A ANT-C Organization: Boundaries & Tethering Elements 


a strong impact on local genome organization, 
they have little effect on the overall structure 
of TADs (Fig. 2A and fig. S7), suggesting that 
insulators and tethering elements operate 
largely independently of one another. To better 
understand the relationship between long-range 
enhancer-promoter interactions and TAD 
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structures, we systematically disrupted each of 
the TAD boundaries across the Dfd-Scr-Antp 
interval (tables S3 and S4). 

Deletion of the Dfd 3’ insulator causes a 
wholesale fusion of the Dfd TAD with the ad- 
jacent mzR-10 TAD and reduces transcription 
of the Dfd gene (Fig. 3A and figs. S10 to S12). 


Fig. 1. Hierarchical genome organization: Bound- 
aries and focal contacts. (A) ANT-C organization 
(Dfd-Antp interval). The following are shown from top 
to bottom: Micro-C contact map showing TADs and 
focal contacts (arrows); Hox genes (black); other 
genes (gray); regulatory elements; and chromatin 
immunoprecipitation (ChIP) data for Trl and CP190. 
(B) Tethers and boundaries are physically distinct. 

(C) Epigenetic signatures of tethers and boundaries. 
DNase, deoxyribonuclease |. (D) Fraction of contacts 
connecting gene promoters to “orphan” DTEs 

and a histogram of loop spans (black, all loops). 

(E) Enhancer activity, by functional class (***p < 10” 
versus Zld peaks, Bonferroni-corrected chi-square 
test; n.s., not significant). (F) Image of a live embryo 
showing transcription of Dfd, Antp (cyan), and Scr 
(yellow, image enhanced), with nuclei in purple. 


Fig. 2. Tethering elements foster enhancer- 
promoter interactions and control activation 
kinetics. (A) Micro-C for Scr DTE mutant embryos. 
(The triangle indicates the location of the deletion.) 
Virtual 4C (v4C) shows decreased interactions 

of the EE enhancer with the promoter upon DTE 
deletion (arrow) and increased interactions with 
regions beyond the DTE (asterisk). The focal contact 
persists in AScrEE embryos (inset). (B) Live mea- 
surements of endogenous Scr transcription show 
delayed activation in AScrDTE embryos. A-P, 
anterior-posterior; FU, fluorescence units; N, number 
of embryos; shading, +SEM). 
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Fig. 3. Insulators prevent regulatory interference and promote transcrip- 
tional precision. (A) Micro-C and Dfd transcription measurements for 
ADfd3' insulator mutant embryos. The triangle indicates the location of the 
deletion. (B) Micro-C and Scr transcription measurements for ASF1 (and 


ASF2) embryos. The focal contact persists (arrows 


Notably, it does not appear to weaken inter- 
actions between the Dfd promoter and enhancer, 
suggesting that TAD boundaries play no role 
in fostering appropriate regulatory interactions. 
Rather, the 3’ insulator specifically prevents 
inappropriate contacts with the miR-10 regu- 
latory region. 

Similarly, individual deletions of the boun- 
daries of the fiz TAD, which is nested within 
the Scr locus, cause fusions with either side of 
the Scr TAD. The remaining insulator contin- 
ues to enforce a robust boundary (Fig. 3B and 
figs. S10 to S12). Scr transcription is markedly 
reduced in both cases, though the deletion of 
SF1 has a substantially more severe impact than 
SF2 (Fig. 3B and fig. S10). Neither deletion 
disrupts the promoter-DTE interaction (fig. 
SID, suggesting that TAD boundaries are not 
required for the establishment or mainte- 
nance of long-range focal contacts. This sup- 
ports the view that tethers and boundaries 
constitute independent levels of organization, 
as suggested by our genome-wide analysis. 

The disruption of Scr TAD boundaries is 
also consistent with this model. Deletion of 
the Ser 3' insulator is recessive lethal, probably 
because of the loss of essential 7SL genes, 
and could not be analyzed by Micro-C. But a 
targeted deletion of the Antp 3’ intronic 
insulator is viable and causes a partial fusion 
of the Scr and Antp P2 TADs (figs. S10 to S12). 
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in AScr3' and AAntp3' embryos. (D) Antp transcription in AAntp3' embryos. 
(E) Interaction landscape of the Scr promoter upon disruption of the ftz 
TAD (see Micro-C above). (F) Reporter assay showing silencing by the AE1 
enhancer within the Scr expression domain (dashed box). In all panels, 
shading indicates SEM. 


The persistence of a residual boundary can 
be explained by the presence of a secondary 
insulator located ~4: kb away. Deletion of either 
Scr TAD boundary severely reduces Scr tran- 
scription (Fig. 3C and figs. S10 and S11). Notably, 
disruption of the Scr-Antp boundary does not 
weaken the interaction of the DTE with the Scr 
promoter (fig. S11), suggesting that reduced Ser 
expression is not due to diminished enhancer- 
promoter interactions. This partial fusion of 
the Scr and Antp P2 TADs has, at most, only 
a marginal impact on Antp transcription 
(Fig. 3D and figs. S10 and S11), revealing that 
boundary deletions can have sharply asym- 
metric regulatory effects on flanking TADs. 
Because TAD boundary deletions do not 
alter appropriate enhancer-promoter inter- 
actions, we sought an alternative explanation 
for reduced Ser transcription arising from dis- 
ruptions of the fitz TAD. SF7 removal exposes 
the Scr promoter to interactions with the ftz 
regulatory region (Fig. 3E and fig. S11), which 
may thus directly interfere with Scr tran- 
scription. By contrast, SF2 removal allows ftz 
regulatory sequences to interact with the EE 
enhancer (fig. S11), but not directly with the 
Scr promoter (Fig. 3E and fig. S11), which 
may explain its more subtle transcriptional 
impact. In the absence of SF7, the severely 
narrowed Scr domain and distinctive ectopic 
stripes suggest both activation and silencing 
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by fiz enhancers (fig. $13). A prime suspect for 
this altered expression pattern is the AE1 en- 
hancer, which binds both activators and the 
Hairy repressor (fig. S13). Indeed, the AE1 ele- 
ment functions as a potent silencer within the 
Scr expression domain (Fig. 3F and fig. S10), 
and Ser transcription faithfully mirrors AE1 
activity upon SFI removal (fig. S13). We con- 
clude that the primary function of insulators 
is to prevent regulatory interference between 
TADs, and this can explain even surprising 
quantitative differences in the transcriptional 
effects of boundary deletions. 

To assess the functional importance of 
tethering elements and insulators, we analyzed 
the number of teeth on the sex combs of adult 
males, a quantitative phenotype under sexual 
selection governed by Scr expression. All rel- 
evant deletions reduce the average number of 
teeth, and the magnitude of the transcriptional 
defects is highly predictive of the severity of the 
morphological phenotypes (Fig. 4, A and B, and 
fig. S14). These observations demonstrate the 
importance of genome structure for the control 
of transcriptional dynamics and the precision of 
developmental patterning. 

Taken together, our observations support a 
general model in which genome organization 
canalizes regulatory interactions through two 
classes of organizing elements with diamet- 
rically opposing functions. A dedicated class 
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Fig. 4. Genome organization controls transcriptional dynamics and developmental patterning. 

(A) Representative images of sex combs from adult males. (Numbers indicate tooth counts.) (B) Correlation 
of transcriptional output and tooth count (inset, locus map; red bar, sex comb enhancer; error bars, 
+SEM). (€) Organization of the Scr locus: Tethers foster specific enhancer-promoter interactions, whereas 
boundaries prevent regulatory interference between TADs. 


of tethering elements, often physically distinct 
from enhancers, foster enhancer-promoter 
interactions and are key to fast transcriptional 
activation kinetics during development (Fig. 4C). 
We anticipate that similar mechanisms will 
prove to be an important property of verte- 
brate genomes, where large distances often 
separate genes from their regulatory sequences 
(9, 23, 24). By contrast, TAD boundaries have a 
pervasive role in enforcing regulatory specificity 
by preventing interference between neighbor- 
ing TADs (Fig. 4C). 

Although prior studies have emphasized the 
spatial regulation of gene expression, temporal 
dynamics have proven far more elusive. Quan- 
titative measurements in live embryos revealed 
clear delays in the onset of transcription upon 
deletion of tethering elements. The Trl pro- 
tein, which binds most of these sequences, has 
been proposed to act as a DNA looping factor 
(25, 26). We suggest that tethering elements 
“Sump-start” expression by establishing enhancer- 
promoter loops before activation, though it is 
likely that they also serve a broader function. 
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Indeed, it is intriguing that the Scr DTE co- 
incides with a classical Polycomb response ele- 
ment (27). This is consistent with a possible role 
for Polycomb repressive complex 1 (PRC1) com- 
ponents in the establishment of enhancer- 
promoter loops (28) and suggests that focal 
contacts constitute a versatile topological 
infrastructure used by a variety of regulatory 
mechanisms. Our study shows that genome 
organization shapes transcription dynamics 
through two complementary mechanisms: 
Tethering elements foster appropriate enhancer- 
promoter interactions, whereas TAD boundaries 
prevent inappropriate associations. 
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Probing subthreshold dynamics of hippocampal 
neurons by pulsed optogenetics 


Manuel Valero*, Ipshita Zutshi?, Euisik Yoon”, Gyérgy Buzsaki 


1,4,5% 


Understanding how excitatory (E) and inhibitory (I) inputs are integrated by neurons requires monitoring 
their subthreshold behavior. We probed the subthreshold dynamics using optogenetic depolarizing 
pulses in hippocampal neuronal assemblies in freely moving mice. Excitability decreased during sharp- 
wave ripples coupled with increased |. In contrast to this “negative gain,’ optogenetic probing showed 
increased within-field excitability in place cells by weakening | and unmasked stable place fields in 
initially non-place cells. Neuronal assemblies active during sharp-wave ripples in the home cage 
predicted spatial overlap and sequences of place fields of both place cells and unmasked preexisting 
place fields of non-place cells during track running. Thus, indirect probing of subthreshold dynamics in 
neuronal populations permits the disclosing of preexisting assemblies and modes of neuronal operations. 


nderstanding how neurons integrate 

excitatory (E) and inhibitory (I) inputs 

requires access to the neuron’s sub- 

threshold dynamics (J-4). Because in- 

tracellular monitoring of cell assemblies 
in behaving animals is currently unrealistic, 
different single-cell modes of operations (or 
“models”) have been proposed to explain firing 
characteristics in various circumstances (Fig. 1, 
A and B, and fig. S1) (2). In the “tuned excita- 
tion” (“blanket” inhibition) (/, 5) and “balanced 
network” models (I activity tracks E changes) 
(6-8), both membrane polarization (V,,) and 
firing rate response decrease at more depolar- 
ized V,, (Fig. 1, A and B) (9-12). By contrast, in 
the “reciprocal network” model, reduction of I 
is coupled to V,, depolarization and increased 
firing rate (Fig. 1, A and B) (12-14). Thus, by 
varying V,, experimentally and observing the 
changes in firing rates, one can gain access 
to the subthreshold behavior of neurons (fig. 
S1). Adding active conductances to the model 
neuron affected its quantitative features but 
did not change these predictions qualitatively 
(figs. S2 and $3). 

We probed V,,, with short optogenetic pulses. 
Using micro-light-emitting diode (uLED) 
probes (four shanks with three nLEDs on each 
shank) (5), we recorded and probed large num- 
bers of CA1 pyramidal neurons simultaneously 
in freely moving calcium/calmodulin-dependent 
protein kinase II alpha (CamKIIq) -Cre::Ai32 
mice (Fig. 1, C and D, and fig. S4; n = 822 
pyramidal neurons in four mice; 43.3 + 8.37 
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pyramidal neurons per session). uLEDs were 
activated (0.02 to 0.1 1 W, 20 ms duration) with 
randomly variable (20 to 40 ms) offsets so 
that stimulation of each site reccurred at 
~0.3- to 0.6-s intervals Fig. 1C and fig. S5). 
Random intervals (20 ms) between the light 
pulses served as control epochs for compar- 
ison (materials and methods). Of 822 neurons, 
611 responded unequally to the three neigh- 
boring tLEDs, owing to their different dis- 
tances from the recorded neurons (Fig. 1, D 
and E, and figs. S4 and S5), and these re- 
sponses were used as a proxy for estimating 
relative changes of V,,, and E/I dynamics. The 
evoked spike responses varied as a function 
of brains state (fig. S6) but did not perturb 
the firing features of the neurons (fig. $7). 
No changes were observed in nonresponsive 
neurons, safeguarding against local network- 
induced effects (fig. S8). 

During sharp-wave ripples (SPW-Rs), excit- 
atory neurons increased their firing rates more 
than inhibitory neurons (fig. $9) (6). In con- 
trast to this population gain of excitation, 
light-induced spike responses in individual 
pyramidal cells decreased during SPW-Rs 
(ARate; Fig. 1, F to I). Increasing V,, depo- 
larization decreased the light-induced re- 
sponse during SPW-Rs (Fig. 1J), resembling 
the balanced mode of operation (Fig. 1A). This 
conclusion was further supported by the neg- 
ative correlation between firing-rate change 
during SPW-R and baseline firing rates of 
neurons (p = -0.19, P <10~’; fig. S10) and more 
directly by intracellular experiments, in which 
Vin Was systematically varied (Fig. 1, K to M), 
reproducing the effect seen with optogenetic 
V,, depolarization (Fig. 11) and favoring the 
balanced E/I model. 

Next, we examined the subthreshold E/I dy- 
namics of place cells. During track running, 
three blocks of 10 baseline runs on a linear 
track were interleaved with two blocks of 40 
to 50 stimulation runs (fig. S7D). We observed a 
gain at the trough of the theta cycle, the phase 
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corresponding to the strongest synchrony of 
pyramidal neurons (Fig. 2A) (/6). We then 
compared neuronal excitability within and 
outside the place fields of place cells (77). 
With a standard definition of “place field” 
(materials and methods) (18), more than half 
of the pyramidal neurons were classified as 
place cells [73% and 71% of light-responsive 
and nonresponsive neurons, respectively; P = 
0.90, x” test; (17-19); fig. S11]. The induced 
spike responses varied within and outside the 
place field (Fig. 2, C and D). The induced rate 
increase was higher within than outside the 
place field (Fig. 2E and fig. S12). Increasing 
depolarization of V,, by stronger light inten- 
sity increased the in-field gain several-fold 
(Fig. 2F). This in-field gain was positively 
correlated with both the out-of-field firing 
rate and the home-cage firing rate of the 
neuron (p = 0.17, P<10~° and p = 0.25, P <10~’, 
respectively; fig S12). No rate changes were 
observed in nonresponsive pyramidal neurons 
(fig. S8). These results support the reciprocal 
mode of operation. 

Light responses in place cells, tested in the 
home cage before the track, were significantly 
stronger than in non-place cells, and these 
results cannot be explained by differences in 
firing rate (Fig. 3A and fig. S13), suggesting 
that neurons with higher excitability more 
likely express place fields. In support of this 
hypothesis, optogenetic depolarization revealed 
place fields in the majority of non-place cells 
(Fig. 3, B and C; 69.3%, 289 of 417; materials 
and methods), although the in-field gain was 
less for the induced place fields than for real 
place fields (Fig. 3D and fig. S13). We found a 
robust correlation between the spatial location 
of induced place field spikes and the sparse 
spikes of non-place cells in the absence of 
stimulation (Fig. 3, C and E; “ghost fields”). 

Features of the optogenetically unmasked 
place fields were similar to those of real place 
fields (Fig. 3, F and G, and fig. S12). To anchor 
neuronal firing to behavior, we examined 
the precision by which the animal’s position 
on the track can be predicted by active neu- 
rons (19). The root mean squared error of the 
decoded position was highest for the sparse 
non-place cell spikes and lowest for light- 
boosted spikes of place cells. The induced 
spikes of non-place cells more accurately 
predicted the mouse’s position on the track 
than those of “bona fide” place cells (Fig. 3H 
and fig. S11). 

We found a reliable correlation between 
spatial correlations of place cell pairs on the 
track and firing rate correlations of the same 
pairs during SPW-Rs in the home cage (Fig. 4, 
A to C). (20). No such relationship was present 
for non-place cell pairs (Fig. 4C). However, 
during optogenetic stimulation, the relation- 
ship between cofiring during SPW-Rs and spa- 
tial overlap was revealed for unmasked place 
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Fig. 1. Decreased single-neuron excitability during SPW-Rs. (A) Different 
relationships of the excitatory and inhibitory conductances (top row) lead to 
specific membrane potential (middle row) and firing rate (bottom row). (B) Rate 
predictions as a function of the holding V,,. (©) CAl neurons in CamKlla-Cre:: 
Ai32 mice respond to 20-ms random pulses. (D) Probe shank locations. 

(E) Peristimulus time histogram (PSTH) from a pyramidal cell responding to 
three uLEDs on the same shank but not one to nine wLEDs on different shanks. 
(F) (Top left) Histograms showing responses to light pulses (red) and control rate 
(black dashed line) during SPW-Rs (30 bins of 50 ms). (Bottom) Response 
displayed for the same single neuron. (Right) Two other example cells. (G) Z-scored 
control rate (left), optogenetic responses (center), and rate change (ARate = wLED 
responses — control; right) during SPW-Rs for all light-responding cells ranked by light- 
response amplitude. (H) Group control firing rate (black), light response rate (red) 


fields of non-place field pairs (Fig. 4C). To 
study the population consequence of the pair- 
wise effects, we performed independent com- 
ponent analysis (ICA) on the Z-scored spike 
matrix of pyramidal neurons (27) to extract 
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patterns of higher-order cofiring in the home 
cage (Fig. 4D). Assembly members of place 
cells, but not of mixtures of place and non- 
place cells, showed higher spatial correlation 
than chance (Fig. 4E). However, when spikes 


[mean + confidence interval at 95% (C195), bottom], and rate difference (gold; mean + 
C195). (I) Optogenetic responses decreased during SPW-Rs (ARate; n = 485 neurons; 
P.<10™, Wilcoxon paired signed-rank test) (J) (Top) Difference between in-SPW-R 
versus outside SPW-R firing rates as a function of three light intensities in three 
neurons. (Bottom) Population average (mean + C195 p = - 0.55, P<10°°”:P<10° 
for all comparisons, Friedman test). (K) Pyramidal neuron filled with biocytin from 
a head-fixed waking mouse experiment. (Bottom) Responses of the filled neuron 
at different V,, (three traces are highlighted in black) during SPW-Rs (top 

gray line, average ripple). (L) Relationship between the holding V,, and V., change 
(left) and firing-rate change (right) for all SPW-R in (K). (M) Group results for 
five cells from five anesthetized rats (green) and five cells from four head-fixed 
mice (pink). (Right) Decreased gain during SPW-Rs (P = 0.002; Wilcoxon paired 
signed-rank test). **P < 0.01 and ***P < 0.001. 


of unmasked place fields were considered, 
they expressed spatial correlation at the level 
of real place cells (Fig. 4F and fig. S14). Se- 
quential firing of place cells was correlated 
with spike sequences during SPW-Rs (Fig. 4G) 
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Fig. 2. Increased excitability during theta A SHLED tose B (22, 23). The fraction of SPW-R events with 
oscillations and within place fields. 6 significant virtual track trajectories increased 
(A) Thin line, theta phase. Red and FPA Og seh when unmasked place fields were also included 
black lines, phase histograms of spikes 2 = 1 saad for the construction of the place field sequence 
during optostimulation and control pulses, = 5 & oa template (Fig. 4H). 
respectively (mean + CI95). (B) Rate ee oe + 001 = Optogenetic depolarization of neurons in- 
gain at the trough of the theta cycle @ phase [deg] e& a creased the within-field firing rate gain in hip- 
(P < 10°©, Wilcoxon test). (C) (Top) Caktet MED tape pocampal place cells and unmasked place 
Light-induced spike histograms (red line) Cc 7 D pice 3 ele picok fields in non-place cells (J, 24-26), implying 
and control rate (black line). (Bottom) for pane 1 f q that almost any pyramidal cell can express a 
the same single neuron. (D) Control, § ull il, = place field and that the entire CA1 population 
light responses and difference (resp-con) ge) uit a 2 contributes to forming specific attractors or 
for all light-responsive neurons, ranked by z a 3 8 trajectories in any given situation (J, 24, 26, 27). 
the control rate peak position. (E) Responses 2 contra. In these preconfigured attractors (23, 28, 29), 
were larger inside than outside the place fo 553 neurons with the highest excitability form a 
field (n = 553 place fields; P < 10°: ue [om] : ve Hs ste scaffold map and emit high-enough spike rates 
Wilcoxon test). (F) (Top) Difference between E F to be classified as place cells (17). Place cells 
in-field and out-of-field firing rates (gain), Go sal ee are not continuously “driven” by outside cues 
as a function of three light intensities iasOueaald 2 ‘. a os (30, 31) but emerge by transient disinhibition, 
in three neurons (top) and group average kK ay ob perhaps coupled with excitation, as predicted 
(bottom; p = 0.24,P < 10° P< 10%, y vi Be Og by the reciprocal mode of operation and fur- 
Friedman test). ***P < 0.001. B 0.1 3 £ re ff lt ther supported by the position-dependent 
= - £ 10; | firing rates of inhibitory interneurons as well 
: oe & 7 8 9 as the decreased inhibition of place cells with- 
xs peo een nee in their fields (fig. S15). Our results challenge 
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place cells (top, P < 107°, 
x? test against 500 shuffles) and unmasked place fields of non-place cells (bottom; P < 102°, x? test). (F) Spike activity for an example non-place cell during 
baseline (no stimulation) runs (1 to 10, 61 to 70, 121 to 130) and stimulation runs. Spike activity during light stimulation (20-ms pulses; right panel) and between 
stimulation (control) epochs (left panel). Correlation between first and second halves (trials 11 to 60 and 71 to 120) of the session was used to compute place 
field stability. (Bottom) Place fields during light stimulation were more stable for both true place fields and unmasked place fields [P < 10°? and P< 10°7°, 
espectively; two-ways analysis of variance (ANOVA)] than during control epochs. (G) Correlation of spatial information (Bits/spike) between control and light- 
stimulated epochs (P < 10-® and P < 10°°®, respectively; KW test). Black line shows an exponential fit. Light-boosted effect was stronger for unmasked place 
fields of non-place cells than for place cell place fields (P < 10°°; KW test). (H) Spatial decoding accuracy of the mouse’s position on the track increased during 
ight-stimulation epochs of both non-place cells and place cells (P < 10-° and P < 10°’ for control and light epochs, respectively; two-way ANOVA). Note lower mean 
squared error (MSE) during light stimulation of non-place cells compared to control spiking of place cells (P = 0.001, Tukey test). **P < 0.01 and ***P < 0.001. 
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Fig. 4. Uncovering spatial overlap of preexisting cell ensembles. (A) Neural 
sequence of place cells and non-place cells during a SPW-R. (B) Similarity 
matrices show cofiring of 47 pyramidal neurons in an example session during 
SPW-R in the home cage and spatial correlations of the same pairs (Spearman's p) 
during control and light-stimulation epochs on the track. (C) SPW-R cofiring was 
positively correlated with spatial overlap in place cell pairs (P < 10-*’, Spearman 
correlation) but not in pairs with place and non-place cell (non-PC) partners 
(P = 0.11). (D) Same as (C), but for light-induced responses (P < 107°, Spearman 
correlation; P < 10° for place cell pairs; P = 0.002 between control and light 
stimulation after correcting by the spatial cofiring: repeated-measures ANOVA). 

(E) Cell assemblies (21) in home cage recordings. Relative weights of neuron in an 


example assembly. Neurons with > 2 SDs (dashed gray line) of the weight dis- 
tribution were classified as members of the assembly. (F) Spatial overlap on 

the track (p) was higher among assemblies consisting of only—place cells than 

for assemblies of mixed place cells and non-place cells (P = 0.006, P = 0.31, and P = 
0.02 for assemblies, light stimulation, and their interaction, respectively; two-way 
ANOVA). (G) Forward replay sequence during home cage recording. Bayesian 
decoding (22). (H) The fraction of SPW-R events with significant trajectories (against 
500 shuffles) increased (P < 10~*, Friedman test) when unmasked place fields of 
non-place cells (P <10~*, Tukey test) and when spikes from both stimulated 
place fields and unmasked place fields (P <10~°) were included for the construction 
of the place field sequence template. *P < 0.05, **P < 0.01 and ***P < 0.001. 


the current notion of spatially uniform inhi- 
bition underlying place cell properties and 
reconcile several models of place field emer- 
gence (1, 25, 26, 29, 32). 

Optogenetic perturbation during the theta 
cycles and SPW-Rs revealed opposite excit- 
ability rules, and the SPW-R data were best 
fitted by a balanced network model (33-35). 
Even though SPW-R represents the highest 
excitability state of the CA1 network, the con- 
tributing individual neurons decrease their 
excitability. This negative gain enables larger 
rate changes of slow firing, compared to fast 
firing, neurons during SPW-R. By contrast, 
the reciprocal mode of operation during ex- 
ploration allows for larger in-field gain for 
faster-firing, compared to slow-firing, neu- 
rons. Brain state-dependent shifts between 
the reciprocal and balanced E/I modes of ope- 
rations may be brought about by the altered 
temporal relationship between interneuron 
and pyramidal cell spiking and the consequent 
Vm (36), perhaps set by subcortical neuro- 
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DEVELOPMENT 


Establishment of mouse stem cells that can 
recapitulate the developmental potential 


of primitive endoderm 


Yasuhide Ohinata’2*, Takaho A. Endo®, Hiroki Sugishita”, Takashi Watanabe’, Yusuke lizuka?, 
Yurie Kawamoto”, Atsunori Saraya', Mami Kumon”, Yoko Koseki~, Takashi Kondo7, 


Osamu Ohara*“*, Haruhiko Koseki*? 


The mammalian blastocyst consists of three distinct cell types: epiblast, trophoblast (TB), and primitive 
endoderm (PrE). Although embryonic stem cells (ESCs) and trophoblast stem cells (TSCs) retain the 
functional properties of epiblast and TB, respectively, stem cells that fully recapitulate the 
developmental potential of PrE have not been established. Here, we report derivation of primitive 
endoderm stem cells (PrESCs) in mice. PrESCs recapitulate properties of embryonic day 4.5 founder 
PrE, are efficiently incorporated into PrE upon blastocyst injection, generate functionally competent PrE- 
derived tissues, and support fetal development of PrE-depleted blastocysts in chimeras. Furthermore, 
PrESCs can establish interactions with ESCs and TSCs and generate descendants with yolk sac-like 
structures in utero. Establishment of PrESCs will enable the elucidation of the mechanisms for PrE 
specification and subsequent pre- and postimplantation development. 


piblast, trophoblast (TB), and primitive 

endoderm (PrE) are differentiated from 

the zygote by the late blastocyst stage of 

mouse preimplantation development and 

contribute to generate the major parts of 
the embryo, placenta, and yolk sac, respec- 
tively, during postimplantation development. 
Our understanding of the functional proper- 
ties of epiblast and TB has been enhanced by 
the use of embryonic stem cells (ESCs) and 
trophoblast stem cells (TSCs), which retain 
functional properties of epiblast and TB, re- 
spectively, and have provided experimental 
platforms to dissect their functions (J, 2). By 
contrast, although extraembryonic endoderm 
cells (XENCs) have been derived from PrE, 
they do not fully recapitulate the developmen- 
tal potential of the PrE (3). 

The PrE lineage is vital for normal embry- 
onic development and is the origin of the vis- 
ceral endoderm (VE) of the visceral yolk sac, 
the parietal endoderm (PE) of the parietal yolk 
sac, and the marginal zone endoderm (MZE) 
lining the boundary between the placenta and 
chorionic plate of the placental disk (4). These 
extraembryonic endoderm tissues play crucial 
roles in nourishing the embryo through nutrient 
provision at the maternal-fetal interface (5), es- 
pecially before the establishment of placental 
circulation [approximately embryonic day 10.5 
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(E10.5) in mice], in anterior-posterior pattern- 
ing of the epiblast during gastrulation (6), and 
in yolk sac hematopoiesis (7). VE cells derived 
from the PrE are further suggested to contrib- 
ute to formation of the fetal gut (8). 

To gain further insight into how the PrE 
functions during pre- and early postimplan- 
tation development, it would be useful to es- 
tablish stem cell lines that fully retain PrE 
developmental potential beyond that pos- 
sessed by XENCs, which can only contribute 
to the distal region of the PrE (3). We tested for 
culture conditions that enabled robust growth 
of PrE-derived cells from blastocysts (Fig. 1A 
and table S1) and examined the impact of se- 
rum, which provides undefined differentiation 
signals. As reported (3), serum-containing cul- 
ture allowed efficient derivation of XENCs 
from the inner cell mass (ICM). Derivation 
efficiency of XENCs from the ICM was further 
facilitated by addition of 3 uM CHIR99021, 
a GSK3 inhibitor, but was suppressed by the 
mitogen-activated protein kinase (MAPK) 
kinase (MEK) inhibitor PD0325901 (Fig. 1B 
and fig. SIA). By contrast, serum-free medium 
could not support colony formation from the 
ICM by itself, but addition of 3 uM CHIR99021 
promoted characteristic outgrowths and sub- 
sequent generation of dome-shaped colonies 
(Fig. 1B and fig. SIA). This effect of CHIR99021 
was inhibited by PD0325901, leading to the 
induction of ESCs. These results suggested 
that the CHIR99021-induced cells exhibited 
a competitive period for fate decision with 
ESCs and originated from PrE rather than 
epiblasts. 

We next investigated whether CHIR99021- 
induced cells retained the functional proper- 
ties of PrE by maintaining them under various 
serum-free culture conditions (Fig. 1C). The 
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maintenance of these cells was dependent on 
CHIR99021. Leukemia inhibitory factor (LIF) 
did not replace CHIR99021 to maintain pro- 
liferation, suggesting that these cells are dif- 
ferent from ESCs. However, increasing the 
concentration of CHIR99021 to 10 uM accel- 
erated their proliferation and resulted in a 
more compacted colony morphology. The im- 
pact of fibroblast growth factor 4 (FGF4) 
and/or platelet-derived growth factor-AA 
(PDGF-AA), which are known to expand the 
PrE in blastocysts (9, 10), was also tested and 
found to accelerate the growth rate of the 
CHIR99021-induced cells even further, with- 
out any additional morphological changes. 
The CHIR99021-induced cells transformed 
into XENC-like cells in serum+C3 medium but 
were not maintained in ESC medium (fig. S1, B 
and C). Therefore, CHIR99021-induced cells 
appear to be PrE derivatives captured at an 
earlier stage than were XENCs. Together with 
their self-renewing property in culture, we ten- 
tatively refer to CHIR99021-induced cells de- 
rived and maintained in +C10OF4PDGF medium 
as primitive endoderm stem cells (PrESCs). 
Next, we explored the developmental origin 
of PrESCs by comparing the gene expression 
profiles revealed through single-cell RNA- 
sequencing (scRNA-seq) data from the ICM of 
E3.5 and E4.5 blastocysts. To enable a direct 
comparison, scRNA-seq analysis for PrESCs 
was performed in parallel with ESCs, TSCs, 
and XENCs. The homogeneity of PrESCs and 
their similarity to other cell lines were exam- 
ined through principal components analysis 
(PCA). PrESCs were found to represent a ho- 
mogenous population that is distinct from 
ESCs, TSCs, and XENCs (Fig. 2A and table S2). 
Gene expression profiles of ESCs, PrESCs, and 
XENCs were further compared with those of 
individual cells in the ICM of E3.5 and E4.5 
blastocysts (11) by means of agglomerative 
clustering analysis (fig. S2A). Sixteen clusters 
were identified; the E4.5 ICM could be seg- 
regated into putative epiblasts and PrE through 
differential expression of genes from clusters 
1 to 5, which were barely expressed in the 
E3.5 ICM. Previously known PrE marker genes, 
such as Gata4 and Gata6, were enriched in 
cluster 1, which began to express these genes 
from E4.5. By contrast, pluripotent marker 
genes were enriched in clusters 6 and 10, 
which were expressed as early as E3.5. In cell 
lines, cluster 1 genes were expressed by PrESCs 
and XENCs but not by ESCs. Reciprocally, Clus- 
ter 6 genes were more abundantly expressed 
in ESCs than in PrESCs and XENCs. Taken 
together, global clustering analysis revealed 
that PrESCs were closely related to putative 
PrE in the E4.5 ICM and to XENCs. To further 
investigate their similarity, we examined the 
expression of representative PrE marker genes, 
selected mainly from cluster 1, and pluripotent 
marker genes from clusters 6 and 10 (Fig. 2B). 
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Fig. 1. Long-term culture of PrE-derived 


< 2 lclsle Els 
cells. (A) Culture conditions tested in —_ 3 S 2 = 
this study. (B) Derivation of cells in Abbreviations | © & Ala £18 
the indicated culture conditions from BS 3 ® = a 
blastocysts. Derived cells at passage 2 BS FS = g 8 
(P2) are shown. Serum+L and Serum+C3 & = ea 
conditions induced XEN-like colonies, % = 
whereas +C3PD03 induced ESC colonies. zs same! 
+C3 induced distinct dome-shaped Z z 
colonies. Scale bar, 100 um. (€) Colony ry a 
morphology of CHIR99021-induced =| ls 
cells maintained under the indicated + + Fars 
conditions for at least five passages. + + Fee 


Scale bar, 100 um. 


A B E3.5 E4.5 
ICM (90) Epi (19) _ Pre (48) ESC (184) PrESC (184) XENC (184) 


Pluripotent genes 
™ 
& 


PrE genes 
2 
8 


PrESC XENC 
D _OCT4 E-Cad OCT4 E-Cad 


oO 
a 


a 


Col4at, 


Serpinh1. “a a 
ee 


XENC Log} (TPM) 
ESC Log1o (TPM) 


-—PouSft 


Merge Hoechst GATA6 


5 
PrESC Log;9(TPM) PrESC Log,9(TPM) 
Fig. 2. PrESCs retain properties of PrE. (A) PCA comparison of gene expression profiles among ESCs, TSCs, PrESCs, and XENCs. (B) Comparison of pluripotent and PrE 


marker gene expression among E3.5 ICM, E4.5 ICM, ESCs, PrESCs, and XENCs. (€) Comparison of gene expression profiles of PrESCs with those of XENCs and ESCs. Red 
and green dots indicate pluripotency and PrE marker genes, respectively. (D) The expression of OCT4, E-cadherin, and GATA6 in PrESCs and XENCs. Scale bar, 50 um. 
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(B) Distribution of GFP-labeled XENCs and PrESCs in 
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The putative PrE expressed not only PrE mark- 
ers but also some pluripotent markers such 
as Pou5f1 [encoding octamer-binding tran- 
scription factor 4 (OCT4)] and Cdhl (encoding 
E-cadherin). Whereas PrESCs expressed both 
PrE markers and certain pluripotent markers 
expressed in the putative PrE, XENCs failed to 
express these pluripotent markers (table S3). 
The differences between PrESCs and XENCs 
were confirmed with conventional RNA-seq 
analysis (Fig. 2C and table S4). Coexpression 
of the PrE marker (GATA6) and pluripotent 
markers (OCT4 and E-cadherin) in PrESCs was 
further validated with immunofluorescence 
analysis (Fig. 2D). Therefore, PrESCs retain 
the molecular properties of founder PrE, which 
appear around E4.5. Imprinted X-chromosome 
inactivation was consistently established in 
PrESCs as well as in XENCs (fig. S2, B and 
C, and table S4) (3, 12). We further examined 
the PrE origin of PrESCs by investigating 
derivation efficacy of PrESCs from ICM de- 
pleted of PrE by PD0325901, which dropped to 
50% of the untreated ICM (fig. S2, D and E). 
GATA6*OCT4* monolayer cells that were 


tribution of GFP-labeled 
XENCs and PrESCs 18 hours after blastocyst injection. Scale bar, 50 um. 


E14.5 chimeric 
ntation of the PrESC 


culture in +ClOF4HPDGF were also reduced 
upon PrE depletion (fig. S2F). These observa- 
tions also support the PrE origin of PrESCs. 

Next, we investigated the differentiation ca- 
pability of PrESCs by injecting 15 green fluores- 
cent protein (GFP)-labeled PrESCs or XENCs 
into blastocysts and examining their distribu- 
tion after 18 hours. PrESCs were efficiently in- 
tegrated into the PrE layer, whereas XENCs 
were randomly localized in the blastocoel 
(Fig. 3A and fig. S3A). In chimeric conceptuses 
at later stages, PrESCs contributed to both VE 
and PE, whereas XENCs were in only a small 
portion of the PE (Fig. 3B and fig. $3, B to F). 
To confirm the potential of PrESCs to con- 
tribute to both VE and PE, individual PrESCs 
were isolated and propagated. Each clone re- 
tained both VE and PE potentials (fig. S3, G 
and H). Next, we used a blastocyst comple- 
mentation assay to examine whether PrESCs 
can fully replace the endogenous PTE (Fig. 3C). 
The PrE was depleted in blastocysts by treat- 
ing them with PD0325901 for 48 hours (fig. 
S4A) (13). Injected PrESCs efficiently recon- 
stituted the PrE layer in PD0325901-treated 


spread out from the ICM explants after 7 days 


Ohinata et al., Science 375, 574-578 (2022) 


blastocysts and improved the frequency of 


4 February 2022 


complementation assay. (D) Complementation of PrE-depleted blastocysts 

by GFP-labeled PrESCs generated through 0.5 or 1 uM PD0325901 treatment. 
E18.5 chimeric conceptuses revealed either partial or full reconstitution of 
extra-embryonic endoderm derivatives by PrESCs. Scale bar, 1 mm. 


implantation of PrE-depleted blastocysts (fig. 
S4B and table S5). Moreover, nine neonates 
were recovered from reconstituted blastocysts 
treated with 0.5 or 1 uM PD0325901. In these 
chimeras, PrESCs efficiently contributed to ex- 
traembryonic endoderm tissues and, in four 
cases, fully reconstituted these tissues (Fig. 
3D, fig. S4F, and table S5). Full complemen- 
tation of VE and PE by PrESCs was further 
confirmed in E6.5 embryos (fig. S4E); how- 
ever, no contribution of PrESCs to fetal tissues, 
including the gut, was observed (7 = 9 , all ob- 
tained complementation chimeras at E18.5) (Fig. 
3D and fig. S4F). PrESCs therefore can function- 
ally replace the endogenous PrE and thus can 
be annotated as PrE stem cells for the extra- 
embryonic endoderm lineage. 

We further investigated the differentiation 
capability of PrESCs by using a modified blas- 
toid self-organization method (/4), in which 
ESCs, TSCs, and PrESCs were sequentially as- 
sembled to form ETP (ESC/TSC/PrESC) com- 
plexes (ETPs hereafter) (Fig. 4, A and B). In 
this condition, respective stem cells estab- 
lished uniform contacts in ETPs by day 3 
(Fig. 4C). The differentiation status of the 
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Fig. 4. Differentiation potential of PrESCs in ETPs. (A) Schematic representation 
of ETP generation. (B) ETP morphology at day 3 and day 6. Scale bar, 200 um. 
(C) Higher magnification view of a day 3 ETP. TSC-derivatives are demarcated by 
CDX2 expression. Scale bar, 100 um. (D) Decidual reaction induced by ETPs on 
prospective E7.5. (E) The ETP descendant on prospective E7.5. (Left) Vasculature is 
surrounding the ETP descendant. (Right) Merged view of PrESC-derivatives (green) in 


respective stem cells was assessed by means 
of scCRNA-seq analysis of the respective stem 
cells and day 3 and day 6 ETPs (fig. S5A). Ten 
clusters were identified by means of PCA, and 
clusters from different stem cells were jux- 
taposed but temporally separated, suggest- 
ing differentiation of respective stem cells in 
ETPs. The derivatives of ESCs and TSCs ap- 
propriately expressed differentiation-related 
genes in day 6 ETPs (fig. S5B). Through PCA 
for PrESC derivatives, PrESC descendants were 
found to exist in four distinct states (fig. S5C). 
A variety of PE-related or VE-related genes 
were expressed in cluster 1 or cluster 8, re- 
spectively (fig. S5D). PrESC can differentiate 
into PE- and VE-like cells and, reciprocally, 
facilitate differentiation of ESCs and TSCs in 
ETPs. To evaluate to what extent these ETPs 
represent normal development, day 3 ETPs 
were transferred into pseudopregnant uterus, 
and efficient implantation was observed, as re- 
vealed by decidua development, on prospective 
E75 embryos (23.3%, 28 of 120) (Fig. 4D). Im- 
planted ETPs were elongated and associated 
with yolk sac-like structures on their surface 
(Fig. 4, E and F). PrESC-derived SOX17* cells 
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formed VE-like and PE-like sheets (Fig. 4, G and 
H). Therefore, PrESCs can establish interactions 
with ESCs and TSCs and generate descendants 
with yolk sac-like structures in utero. ETPs, how- 
ever, failed to form normal embryos (Fig. 4G). 

This study presents a robust protocol for 
the efficient generation of PrE stem cell lines, 
which can be used to elucidate the mecha- 
nisms that underpin PrE specification in vitro 
and to reconstitute embryos with ESCs and 
TSCs in vivo. 
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AAV9 Titer Kit 

Gyros Protein Technologies has released 

the Gyrolab AAV9 Titer Kit for rapid 
determination of physical titer in adeno- 
associated virus serotype 9 (AAV9) vector- 
based cell and gene therapy manufacturing. 
The kit enables researchers to improve 
productivity and is based ona highly selective 
AAV9 affinity ligand developed with Thermo Fisher Scientific’s 
CaptureSelect technology. These ligands are also the basis of 
POROS CaptureSelect AAV9 Affinity Resin, which is frequently 
used to purify AAV9 viral vectors. The AAV9 serotype is attractive 
as a gene-delivery vector, as it crosses the blood-brain barrier 
and targets the central nervous system with high efficiency. The 
AAV9Y Titer Kit provides fast results, enabling 96 data points to be 
generated in 80 min, an improved assay working range compared 
to ELISA methods, and can handle small sample volumes (10X 
less than an ELISA), shortening development timelines of novel 
biotherapeutics, including cell and gene therapies. 

Gyros Protein Technologies 

For info: +1-877-433-9400 

www.gyrosproteintechnologies.com 


RNA Prestain Loading Dye 

Biotium announces the release of EMBER500 RNA Prestain Loading 
Dye. This novel prestain loading dye is much more sensitive than 
conventional prestaining with ethidium bromide (EtBr), and is also 
compatible with blue LED gel imagers, unlike EtBr. Staining for RNA is 
commonly done with EtBr on a denaturing gel after electrophoresis. 
However, denaturing gels can be complicated to prepare and involve 
handling hazardous reagents. RNA prestaining with EtBr can be 
performed with nondenaturing agarose gels, but requires large 
amounts of the dye and still limits sensitivity severely. For maximum 
convenience, the prestain also includes formamide as well as 
electrophoresis tracking dyes, allowing sample denaturing, loading, 
tracking, and staining in a single step. In addition, EMBER500 stains 
both RNA and DNA, allowing detection of contaminating genomic 
DNA in purified RNA samples. Detection of EMBERSOO is flexible, 
with compatibility for UV transilluminators and blue LED gel imagers, 
which eliminate UV exposure hazards. 

Biotium 

For info: +1-800-304-5357 

www.biotium.com 


Basic Lyophilized Isothermal Amplification Microbeads 
PrimelAmp Basic Lyophilized Isothermal Amplification Microbeads 
contain reaction buffer, Mg?*, deoxynucleoside triphosphate, and 
Bst DNA/RNA polymerase, in the lyophilized form. Only primers and 
templates are needed to be added for isothermal amplification. 
The microbeads do not contain any dyes and can be flexibly used 
for various isothermal amplification applications. They can also be 
stored at room temperature (25°C) for a year, which is also very 
convenient for transportation. 

Beijing SBS Genetech 

For info: +86-(0)-10-62969345 

www.sbsgenetech.com 
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Microplate for Nucleic Acid Purification 

The Porvair Sciences 96 deep well plate, 96-well elution plate, 

and 96-well magnetic tip combs are fully compatible with Thermo 
Fisher Scientific’s KingFisher range of purification systems. These 
consumables are made from medical-grade polypropylene to 
ensure low affinity binding of biomolecules and low leachables and 
extractables throughout the extraction and purification workflow. 
This maximizes the yield and quality of isolated proteins and 
nucleic acids from samples and improves assay performance when 
used in conjunction with KingFisher Flex, Duo Prime, and Presto 
instruments. Each v-shaped bottom well supports the specialized 
magnetic tips of all KingFisher instruments with a perfect fit and 
maximizes liquid-sample collection, mixing, and uptake during 
purification. From sample collection and mixing to purification, 
the Porvair Sciences 96 deep well plate ensures reproducible 
purification of cells, proteins, and nucleic acids from a wide range of 
samples. 

Porvair Sciences 

For info: +1-800-552-3696 
www.microplates.com/kingfisher-compatible-96-well-microplate 


Evaporator 

The Smart Evaporator uses the Vacuum Vortex Concentration (VVC) 
method, the world's first concentration method enabling users to 
remove dimethyl sulfoxide (DMSO) and dimethyl formamide, which 
are difficult and time-consuming to concentrate. The novel Spiral 
Plug technology generates a helical flow of air or inert gas over the 
surface of your solvent, increasing the surface area. Since the vial 
isn’t under high vacuum, there is no risk of bumping or splashing. 
The spiral plugs come in various sizes for compatibility with many 
different sample tubes, flasks, and vials. The Smart Evaporator 

has been used in a wide range of fields and applications, including 
chemical biology, complex chemistry, material science, food 
analysis, analytical chemistry, bioimaging, and sample recovery 
when DMSO is used in nuclear magnetic resonance spectroscopy. 
BioChromato 

For info: +81-(0)-466-23-8382 

biochromato.com/smart-evaporator 


Target Enrichment Panels 

Combine the performance of hybrid-capture NGS target enrichment 
and the convenience and turnaround time of amplicon-based 
methods with KAPA HyperPETE target enrichment panels. KAPA 
HyperPETE (Primer Extension Target Enrichment) is a novel 
hybridization capture technology that uses primer extension 
reactions to specifically capture and release target library molecules 
for sequencing. These panels detect all major somatic variants in 
cell-free DNA, FFPE, and RNA samples, including single-nucleotide 
variants, short indels, copy number variants, microsatellite 
instability, and fusion transcripts (known and novel). This technology 
preserves the performance of conventional hybridization while 
enabling a more efficient workflow. Interrogate difficult regions by 
leveraging Roche's expertise in panel content and design. Save time 
and resources with single-day, automatable workflow. Increase 
analysis efficiency with focused, high-uniformity panel design 

with no need for primer trimming. Preserve precious samples and 
improve sensitivity by avoiding the need to split panels. 

Roche Sequencing and Life Science 

For info: +1-800-262-4911 

sequencing.roche.com/en-us 
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TENURE TRACK FACULTY POSITION IN 
REGENERATIVE MEDICINE AND CELL BIOLOGY 


f MUSC 


MEncaLonivensiry Medical University of South Carolina, Department of 
of SOUTH CAROLINA — Regenerative Medicine and Cell Biology 


A tenure track faculty position at the Assistant, Associate, or Professor level is 
available for a researcher that focuses on a topic broadly related to the digestive 
system. Candidates at the senior levels are expected to bring a vigorous research 
program with significant extramural funding. Candidates whose research 
complements existing strengths, which include tissue engineering, pluripotent 
stem cell differentiation, cellular basis of disease, and molecular biology of cell 
function, are particularly encouraged to apply. Competitive salary, laboratory 
space and start-up funds are available. Research at MUSC is supported by excellent 
core facilities specializing in advanced imaging, genetically modified mice and 
rats, drug discovery, proteomics, and stem cell and organoid models of disease. 
Information about the Department can be found at https://medicine.musc.edu/ 
departments/regenerative-medicine. 


Applicants should provide a research plan and a curriculum vitae including 
the names of three references through the MUSC employment portal: https:// 
careers.pageuppeople.com/756/cw/en-us/job/541082/univ-regenerative- 


medicine-open-rank. Review of applications will begin in February and 
continue until the position is filled. 


The Medical University of South Carolina in Charleston was established in 1824 
and is the 10th oldest continuously operating medical school in the United States. 
It has approximately 2,200 graduate and professional students that are supported 
by 1300 faculty members. Total annual research funding exceeds $250 million. 
This includes funding that supports the NCI—designated Hollings Cancer Center, 
a Clinical and Translational Science Award, a Center for Biomedical Research 
Excellence in Digestive and Liver Disease, and Digestive Disease Research 
Core Center (https://medicine.musc.edu/departments/dom/divisions/ 
gastroenterology/research/labs-and-centers/ddrec) along with multiple 
institutional training grants that support graduate student stipends. 


Charleston is a unique coastal city of half a million residents that is consistently 
ranked as a top international destination. It has a rich history, food culture, 
and offers extensive outdoor recreation opportunities. The city hosts the 
internationally renowned Spoleto Festival USA, which offers the finest in 
theater, opera, dance, music and art. Information can be found at http://www. 
charlestoncvb.com/. 


MUSC is an Affirmative Action/Equal Opportunity Employer. 


A career plan customized 
for you, by you. 


For your career in science, there’s onlyone Science 


Features in myIDP include: 


= Exercises to help you examine your skills, interests, 
and values. 


= Alist of 20 scientific career paths with a prediction 
of which ones best fit your skills and interests. 


A Visit the website and start planning today! 
yy by myIDP.sciencecareers.org 
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Who’s the top employer for 2021? 


Science Careers’ annual survey reveals the top companies in biotech 


& pharma voted on by Science readers. 


Read the article and employer profiles at 
sciencecareers.org/topemployers 
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i=] =< Shenzhen Institute of 
Tih Advanced Technology | 


Chinese Academy of Sciences 


Established in partnership between the Chinese Acade- 
my of Sciences and the Shenzhen Municipal Govern- 
ment, the Shenzhen Institute of Advanced Technology 
(SIAT) is a newly-created university with an objective to 
become the world's preeminent institute for emerging 
science and engineering programs. SIAT is equipped 
with state-of-art teaching and research facilities and is 
dedicated to cultivating international, visionary, and in- 
terdisciplinary talents while delivering research support 
to pursue innovation-driven development. 
eee 
SIAT is located in Shenzhen, also known as the “Silicon 
Valley of China,” a modern, clean, and green city, 
well-known for its stunning architecture, vibrant econo- 
my, and its status as a leading global technology hub. 
SIAT is seeking applications for faculty positions of all 
ranks in the following academic programs: Computer 
Science and Engineering, Bioinformatics, Robotics, 
Life Sciences, Material Science and Engineering, Bio- 
medical Engineering, Pharmaceutical Sciences, Syn- 
thetic Biology, Neurosciences, etc. SIAT seeks individ- 
uals with a strong record of scholarship who possess 
the ability to develop and lead high-quality teaching 
and research programs. SIAT offers a comprehensive 
benefits package and is committed to faculty success 
throughout the academic career trajectory, providing 
support for ambitious and world-class research proj- 
ects and innovative, interactive teaching methods. 


Further information: 
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ICYS Research Fellow at ICYS, NIMS, Japan 


The International Center for Young Scientists (ICYS) of the National 
Institute for Materials Science (NIMS) invites applications for ICYS 
Research Fellow positions. ICYS will offer you the freedom to conduct 
independent and self-directed research in various areas of materials science 
with full access to NIMS advanced research facilities. 
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The common language at ICYS is English. Clerical and technical support in 
English will be given by the ICYS staff. An annual salary of approximately 
5.35 million yen is guaranteed, which may be increased to a maximum of 
~5.88 million yen depending on the performance of the Research Fellow*. 
In addition, a research grant of 2 million yen per year will be provided 
to each Research Fellow. The initial contract term is two years, which 
may be extended for another year depending on one’s performance. Also, 
advantage is given when applying to NIMS permanent researcher position 
(about 50% of the applicants are accepted). 


All applicants must have obtained a PhD degree within the last ten years. 
Applicants should submit an application form including a research 
proposal during theICY Sterm, CV, a list of DOI of journal publications, 
PDF files of three significant publications, and PhD Certificate to 
the ICYS Recruitment Desk by March 31, 2022 JST. The format for 
the application documents can be downloaded from our website. The 
selection will be made on the basis of originality and quality of the research 
proposal as well as the research achievements. Please visit our website 
for more details. 


* Approximately 23% of annual salary will be deducted as social 
insurance premium, residence tax and income tax. 


ICYS Recruitment Desk 
National Institute for Materials Science 
www.nims.go.jp/icys/recruitment/ 
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Global Scholar Recruitment Campaign 


City University of Hong Kong (CityU) is one 
of the world’s leading universities, known 
for innovation, creativity and research. We 
are now seeking exceptional scholars to 
join us as Assistant Professors/Associate 
Professors/Professors/Chair Professors (on 
substantiation-track) in all academic fields with special focuses on 
One Health, Digital Society, Smart City, Matter, Brain, and related 
interdisciplinary areas. Research fields of particular interest include, 
but not limited to: 


= 


+ biomedical science and engineering 

* veterinary science 

* computer science and data science 

* neuroscience and neural engineering 

+ bio-statistics and Al-healthcare 

+ smart/semi-conductor manufacturing 

+ Al/robotics/autonomous systems 

* aerospace and microelectronics engineering 
* energy generation and storage 

+ digital business and innovation management 
+ fintech and business analytics 

* computational social sciences 

+ digital humanities 

+ digital and new media 

« law and technology 

+ private law 

+ healthy, smart and sustainable cities 


Successful candidates should have a demonstrated ability to build a 
world-class research programme related to CityU’s strategic research 
areas, plus a commitment to education and student mentorship. 
Candidates must possess a doctorate in their respective field by the 
time of appointment. 


Outstanding faculty joining the University will be considered for 
nomination of the Global STEM Professorship Scheme sponsored by 
the Government of the Hong Kong Special Administrative Region, 
and may be provided with subsidy for their research teams and for 
setting up laboratories. 


Please visit Colleges, Schools and Departments in CityU at 
https://www.cityu.edu.hk/academic/colleges-schools-and-departments 


City University of Hong Kong /s an equal opportunity employer. We are committed to the 
principle of diversity. Personal data provided by applicants will be used for recruitment 
and other employment-related purposes. 


Worldwide recognition ranking #53 (OS 2022), and #4 among top 50 universities under age 50 (OS 
2021); #1 in the Worlds Most International Universities (THE 2020); #1 in Automation & 
Controf/flectrical & Electronic Engineering/Materials Science & Engineering/Metallurgical 
Engineering/Nanoscence & Nonolectinology/Telecormmunication Engineering in Hong Kong 
(GRAS 2027); and #39 Business School in the World and #4 in Asia (UT Dallas 2016 to 2020) 
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WORKING LIFE 


By Gabriela Lopez 
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More than an exam 


y Ph.D. adviser had encouraged me to take a vacation. So I was sitting at an airport restau- 
rant, sipping a margarita, when I received the email. It informed me I had failed my quali- 
fying exam on my third attempt, which meant dismissal from the program. I knew things 
hadn’t gone perfectly. A day earlier my committee had told me it needed more time to decide 
whether I passed. But I was still dumbfounded. How was it possible that one exam—1 hour 
of my life—could erase all my other successes and define me as unfit to be a scientist? 


I wasn’t sure what to expect when 
I started my Ph.D. program. As an 
Afro-Latinx first-generation college 
graduate, I didn’t have family mem- 
bers who could tell me what it was 
like. I had worked in a lab as an un- 
dergrad student and I assumed I was 
prepared for what was to come. But I 
struggled with my classes during my 
first year, spending countless hours 
receiving tutoring and studying in 
the library. Often, I had to interrupt 
my reading to look up the definition 
of scientific words and concepts. 

I ended that year with increased 
confidence, eager to put my new- 
found knowledge into action as I 
dove deeper into my research. But 
my confidence took another plunge 
shortly thereafter, when I made my 
first attempt at the qualifying exam. 
I had never taken an oral exam be- 
fore, so the experience was terrifying. I stood in front of my 
exam committee while they asked me about my research 
project and then peppered me with questions about concepts 
and methods, some not directly relevant to my research. 

I had switched research projects 5 months earlier, after 
my first adviser left the university, so I wasn’t as confident 
going into the exam as I might have been otherwise. I strug- 
gled to remember terminology and come up with thorough 
answers on the spot, especially when I was asked questions 
I hadn’t previously thought about. 

Once it was over, my committee told me I’d condition- 
ally passed, which meant I had to take more time to study 
and prepare to talk about a subset of topics. I was shaken 
but still hopeful. But when I retook the exam, I failed 
again. That’s when I was told I’d have one more chance. 

For the next 2 months, I did everything in my power to 
prepare. I sat down with my committee chairs and asked 
them for guidance. I practiced answering oral questions 
with my adviser and lab. I even stopped doing lab work to 
focus on my exam preparations. I was all in. 

When the exam was over, I left the room feeling a mix of 


“| still have a little voice in the 
back of my head fretting I’m not 
good enough. But I try to quiet it.” 


fear and relief. But those feelings 
changed to frustration the next day, 
after I learned Id failed. I reflected 
on how different my experience go- 
ing into the exam was from my peers’ 
Many had college-educated fam- 
ily members they could speak with 
about their work. My family mem- 
bers, in contrast, are less familiar 
with science. We also speak Spanish 
at home, and I have difficulty trans- 
lating even the simplest scientific 
concepts into Spanish. These strug- 
gles and many others hampered my 
ability to comfortably speak the ex- 
pected “language of science.” 

My adviser believed in me and 
persuaded the department to allow 
me to complete a master’s degree. 
So I carried on with my research, 
resigned to my situation. But with 
the onset of the COVID-19 pandemic 
and the Black Lives Matter protests, things started to change. 
I watched as movements such as #BlackInThelvory took 
hold, initiating discussions about the lack of support for first- 
generation, underrepresented students in academia. And I 
was heartened to see my program reassess its own approach. 

After a series of meetings and open forums—during 
which I submitted anonymous feedback—faculty mem- 
bers voted to do away with the qualifying exam structure 
I'd struggled with. From then on, students would be asked 
questions, so that faculty could gauge their knowledge and 
skills and provide constructive feedback. But they wouldn’t 
face expulsion from the program. 

My adviser petitioned to reinstate me to the Ph.D. pro- 
gram, and I’m now back to working on my doctorate. I still 
have a little voice in the back of my head fretting ’'m not 
good enough. But I try to quiet it by surrounding myself 
with mentors who support me and by staying focused on 
developing into the great scientist I know I can be. In the 
end, Iam much more than that 1 hour exam. 


Gabriela Lopez is a Ph.D. candidate at Northwestern University. 
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